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Preface 


a About the Text 


This book was written for a sequence of courses on the theory and application of numerical 
approximation techniques. It is designed primarily for junior-level mathematics, science, 
and engineering majors who have completed at least the standard college calculus sequence. 
Familiarity with the fundamentals of linear algebra and differential equations is useful, but 
there is sufficient introductory material on these topics so that courses in these subjects are 
not needed as prerequisites. 

Previous editions of Numerical Analysis have been used in a wide variety of situations. 
In some cases, the mathematical analysis underlying the development of approximation 
techniques was given more emphasis than the methods; in others, the emphasis was re- 
versed. The book has been used as a core reference for beginning graduate level courses 
in engineering and computer science programs and in first-year courses in introductory 
analysis offered at international universities. We have adapted the book to fit these diverse 
users without compromising our original purpose: 


To introduce modern approximation techniques; to explain how, why, and when they 
can be expected to work; and to provide a foundation for further study of numerical 
analysis and scientific computing. 


The book contains sufficient material for at least a full year of study, but we expect many 
people to use it for only a single-term course. In such a single-term course, students learn 
to identify the types of problems that require numerical techniques for their solution and 
see examples of the error propagation that can occur when numerical methods are applied. 
They accurately approximate the solution of problems that cannot be solved exactly and 
learn typical techniques for estimating error bounds for the approximations. The remainder 
of the text then serves as a reference for methods not considered in the course. Either the 
full-year or single-course treatment is consistent with the philosophy of the text. 

Virtually every concept in the text is illustrated by example, and this edition contains 
more than 2600 class-tested exercises ranging from elementary applications of methods 
and algorithms to generalizations and extensions of the theory. In addition, the exercise 
sets include numerous applied problems from diverse areas of engineering as well as from 
the physical, computer, biological, economic, and social sciences. The chosen applications 
clearly and concisely demonstrate how numerical techniques can be, and often must be, 
applied in real-life situations. 

A number of software packages, known as Computer Algebra Systems (CAS), have 
been developed to produce symbolic mathematical computations. Maple®, Mathematica®, 
and MATLAB® are predominant among these in the academic environment, and versions 
of these software packages are available for most common computer systems. In addition, 
Sage, a free open source system, is now available. This system was developed primarily 


ix 
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by William Stein at the University of Washington, and was first released in February 2005. 
Information about Sage can be found at the site 


http://www.sagemath.org . 


Although there are differences among the packages, both in performance and price, all can 
perform standard algebra and calculus operations. 

The results in most of our examples and exercises have been generated using problems 
for which exact solutions are known, because this permits the performance of the approxi- 
mation method to be more easily monitored. For many numerical techniques the error 
analysis requires bounding a higher ordinary or partial derivative, which can be a tedious 
task and one that is not particularly instructive once the techniques of calculus have been 
mastered. Having a symbolic computation package available can be very useful in the study 
of approximation techniques, because exact values for derivatives can easily be obtained. A 
little insight often permits a symbolic computation to aid in the bounding process as well. 

We have chosen Maple as our standard package because of its wide academic distri- 
bution and because it now has a NumericalAnalysis package that contains programs that 
parallel the methods and algorithms in our text. However, other CAS can be substituted with 
only minor modifications. Examples and exercises have been added whenever we felt that 
a CAS would be of significant benefit, and we have discussed the approximation methods 
that CAS employ when they are unable to solve a problem exactly. 


| Sa Algorithms and Programs 


In our first edition we introduced a feature that at the time was innovative and somewhat 
controversial. Instead of presenting our approximation techniques in a specific programming 
language (FORTRAN was dominant at the time), we gave algorithms in a pseudo code that 
would lead to a well-structured program in a variety of languages. The programs are coded 
and available online in most common programming languages and CAS worksheet formats. 
All of these are on the web site for the book: 


http://www.math.ysu.edu/~faires/Numerical-Analysis/ . 


For each algorithm there is a program written in FORTRAN, Pascal, C, and Java. In addition, 
we have coded the programs using Maple, Mathematica, and MATLAB. This should ensure 
that a set of programs is available for most common computing systems. 

Every program is illustrated with a sample problem that is closely correlated to the text. 
This permits the program to be run initially in the language of your choice to see the form 
of the input and output. The programs can then be modified for other problems by making 
minor changes. The form of the input and output are, as nearly as possible, the same in 
each of the programming systems. This permits an instructor using the programs to discuss 
them generically, without regard to the particular programming system an individual student 
chooses to use. 

The programs are designed to run on a minimally configured computer and given in 
ASCII format for flexibility of use. This permits them to be altered using any editor or word 
processor that creates standard ASCII files (commonly called “Text Only” files). Extensive 
README files are included with the program files so that the peculiarities of the various 
programming systems can be individually addressed. The README files are presented 
both in ASCII format and as PDF files. As new software is developed, the programs will 
be updated and placed on the web site for the book. 

For most of the programming systems the appropriate software is needed, such as a 
compiler for Pascal, FORTRAN, and C, or one of the computer algebra systems (Maple, 
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Mathematica, and MATLAB). The Java implementations are an exception. You need the 
system to run the programs, but Java can be freely downloaded from various sites. The best 
way to obtain Java is to use a search engine to search on the name, choose a download site, 
and follow the instructions for that site. 


| New for This Edition 


The first edition of this book was published more than 30 years ago, in the decade after major 
advances in numerical techniques were made to reflect the new widespread availability of 
computer equipment. In our revisions of the book we have added new techniques in order 
to keep our treatment current. To continue this trend, we have made a number of significant 
changes to the ninth edition. 


© Our treatment of Numerical Linear Algebra has been extensively expanded, and con- 
stitutes one of major changes in this edition. In particular, a section on Singular Value 
Decomposition has been added at the end of Chapter 9. This required a complete rewrite 
of the early part of Chapter 9 and considerable expansion of Chapter 6 to include neces- 
sary material concerning symmetric and orthogonal matrices. Chapter 9 is approximately 
40% longer than in the eighth edition, and contains a significant number of new examples 
and exercises. Although students would certainly benefit from a course in Linear Algebra 
before studying this material, sufficient background material is included in the book, and 
every result whose proof is not given is referenced to at least one commonly available 
source. 


e All the Examples in the book have been rewritten to better emphasize the problem to 
be solved before the specific solution is presented. Additional steps have been added to 
many of the examples to explicitly show the computations required for the first steps of 
iteration processes. This gives the reader a way to test and debug programs they have 
written for problems similar to the examples. 


e A new item designated as an IIlustration has been added. This is used when discussing a 
specific application of a method not suitable for the problem statement-solution format 
of the Examples. 


e The Maple code we include now follows, whenever possible, the material included in 
their NumericalAnalysis package. The statements given in the text are precisely what is 
needed for the Maple worksheet applications, and the output is given in the same font 
and color format that Maple produces. 


e A number of sections have been expanded, and some divided, to make it easier for instruc- 
tors to assign problems immediately after the material is presented. This is particularly 
true in Chapters 3, 6, 7, and 9. 


© Numerous new historical notes have been added, primarily in the margins where they 
can be considered independent of the text material. Much of the current material used in 
Numerical Analysis was developed in middle of the 20th century, and students should be 
aware that mathematical discoveries are ongoing. 


e The bibliographic material has been updated to reflect new editions of books that we 
reference. New sources have been added that were not previously available. 


As always with our revisions, every sentence was examined to determine if it was phrased 
in a manner that best relates what is described. 
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| Sa Supplements 


A Student Solutions Manual and Study Guide (ISBN-10: 0-538-7335 1-9; ISBN-13: 978-0- 
538-7335 1-9) is available for purchase with this edition, and contains worked-out solutions 
to many of the problems. The solved exercises cover all of the techniques discussed in the 
text, and include step-by-step instructions for working through the algorithms. The first two 
chapters of this Guide are available for preview on the web site for the book. 

Complete solutions to all exercises in the text are available to instructors in secure, 
customizable online format through the Cengage Solution Builder service. Adopting in- 
structors can sign up for access at www.cengage.com/solutionbuilder. Computation results 
in these solutions were regenerated for this edition using the programs on the web site to 
ensure compatibility among the various programming systems. 

A set of classroom lecture slides, prepared by Professor John Carroll of Dublin City 
University, are available on the book’s instructor companion web site at www.cengage. 
com/math/burden. These slides, created using the Beamer package of LaTeX, are in PDF 
format. They present examples, hints, and step-by-step animations of important techniques 
in Numerical Analysis. 


Sy Possible Course Suggestions 


Numerical Analysis is designed to give instructors flexibility in the choice of topics as well 
as in the level of theoretical rigor and in the emphasis on applications. In line with these 
aims, we provide detailed references for results not demonstrated in the text and for the 
applications used to indicate the practical importance of the methods. The text references 
cited are those most likely to be available in college libraries, and they have been updated to 
reflect recent editions. We also include quotations from original research papers when we 
feel this material is accessible to our intended audience. All referenced material has been 
indexed to the appropriate locations in the text, and Library of Congress information for 
reference material has been included to permit easy location if searching for library material. 

The following flowchart indicates chapter prerequisites. Most of the possible sequences 
that can be generated from this chart have been taught by the authors at Youngstown State 


University. 
Chapter 1 
y y 
Chapter 2 Chapter 6 Chapter 3 
y y Y y Y 
Chapter 10 Chapter 7 Chapter 8 Chapter 4 Chapter 5 
Chapter 9 
©! Chapter 11 {S 


Y 


Chapter 12 
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The additional material in this edition should permit instructors to prepare an under- 
graduate course in Numerical Linear Algebra for students who have not previously studied 
Numerical Analysis. This could be done by covering Chapters 1, 6, 7, and 9, and then, as 
time permits, including other material of the instructor’s choice. 
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Mathematical Preliminaries 
and Error Analysis 


Introduction 
In beginning chemistry courses, we see the ideal gas law, 
PV =NRT, 


which relates the pressure P, volume V, temperature T, and number of moles N of an 
“deal” gas. In this equation, R is a constant that depends on the measurement system. 

Suppose two experiments are conducted to test this law, using the same gas in each 
case. In the first experiment, 


P = 1.00 atm, V = 0.100 m’, 
N = 0.00420 mol, R= 0.08206. 


The ideal gas law predicts the temperature of the gas to be 


PV (1.00) (0.100) 


= = 290.15 K = 17°C. 
NR (0.00420) (0.08206) 


T= 


When we measure the temperature of the gas however, we find that the true temperature is 
15°C, 


We then repeat the experiment using the same values of R and N, but increase the 
pressure by a factor of two and reduce the volume by the same factor. The product PV 
remains the same, so the predicted temperature is still 17°C. But now we find that the actual 


temperature of the gas is 19°C. 


1 
¢ 
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2 CHAPTER 1 = Mathematical Preliminaries and Error Analysis 


Clearly, the ideal gas law is suspect, but before concluding that the law is invalid in 
this situation, we should examine the data to see whether the error could be attributed to 
the experimental results. If so, we might be able to determine how much more accurate 
our experimental results would need to be to ensure that an error of this magnitude did not 
occur. 

Analysis of the error involved in calculations is an important topic in numerical analysis 
and is introduced in Section 1.2. This particular application is considered in Exercise 28 of 
that section. 

This chapter contains a short review of those topics from single-variable calculus that 
will be needed in later chapters. A solid knowledge of calculus is essential for an understand- 
ing of the analysis of numerical techniques, and more thorough review might be needed if 
you have been away from this subject for a while. In addition there is an introduction to 
convergence, error analysis, the machine representation of numbers, and some techniques 
for categorizing and minimizing computational error. 


| 1.1 Review of Calculus 


Limits and Continuity 


The concepts of limit and continuity of a function are fundamental to the study of calculus, 
and form the basis for the analysis of numerical techniques. 


Definition 1.1 A function f defined on a set X of real numbers has the limit L at xo, written 


lim f(x) =L, 


xX XQ, 


if, given any real number ¢ > 0, there exists a real number 5 > 0 such that 


| f(x) —L| <e, whenever xe X and 0 < |x—x| <6. 


(See Figure 1.1.) | 


Figure 1.1 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Definition 1.2 


The basic concepts of calculus 
and its applications were 
developed in the late 17th and 
early 18th centuries, but the 
mathematically precise concepts 
of limits and continuity were not 
described until the time of 
Augustin Louis Cauchy 
(1789-1857), Heinrich Eduard 
Heine (1821-1881), and Karl 
Weierstrass (1815 —1897) in the 
latter portion of the 19th century. 


Definition 1.3 


Theorem 1.4 


Definition 1.5 


1.1 Review of Calculus 3 


Let f be a function defined on a set X of real numbers and xo € X. Then f is continuous 
at xo if 


fim £00) = S60). 


The function f is continuous on the set X if it is continuous at each number in X. a 


The set of all functions that are continuous on the set X is denoted C(X). When X is 
an interval of the real line, the parentheses in this notation are omitted. For example, the 
set of all functions continuous on the closed interval [a, b] is denoted C[a, b]. The symbol 
R denotes the set of all real numbers, which also has the interval notation (—0oo, 00). So 
the set of all functions that are continuous at every real number is denoted by C(R) or by 
C(—00, 00). 

The limit of a sequence of real or complex numbers is defined in a similar manner. 


Let {x,}°2 , be an infinite sequence of real numbers. This sequence has the limit x (converges 
to x) if, for any ¢ > 0 there exists a positive integer N(€) such that |x, — x| < ¢, whenever 
n > N(e). The notation 


lim x, =x, OF X,—2>x as now, 
n—-> Oo 
means that the sequence {x,}°° , converges to x. a 


If f is a function defined on a set X of real numbers and x9 € X, then the following 
statements are equivalent: 


a. ff is continuous at xo; 


b. If {x,}°2, is any sequence in X converging to Xo, then limp f(%n) = fo). & 


The functions we will consider when discussing numerical methods will be assumed 
to be continuous because this is a minimal requirement for predictable behavior. Functions 
that are not continuous can skip over points of interest, which can cause difficulties when 
attempting to approximate a solution to a problem. 


Differentiability 


More sophisticated assumptions about a function generally lead to better approximation 
results. For example, a function with a smooth graph will normally behave more predictably 
than one with numerous jagged features. The smoothness condition relies on the concept 
of the derivative. 


Let f bea function defined in an open interval containing x9. The function f is differentiable 
at Xo if 


fea) = tim £22 £0) 


x>x0 xX — Xo 


exists. The number f’ (xo) is called the derivative of f at xo. A function that has a derivative 
at each number in a set X is differentiable on X. a 


The derivative of f at xo is the slope of the tangent line to the graph of f at (xo, f (xo)), 
as shown in Figure 1.2. 
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Figure 1.2 


Theorem 1.6 


The theorem attributed to Michel 
Rolle (1652-1719) appeared in 
1691 in a little-known treatise 
entitled Méthode pour résoundre 
les égalites. Rolle originally 
criticized the calculus that was 
developed by Isaac Newton and 
Gottfried Leibniz, but later 
became one of its proponents. 


Theorem 1.7 


Figure 1.3 


Theorem 1.8 


The tangent line has slope /’ (xo) 


(xo, f(%o)) y=f(x) 


If the function f is differentiable at xo, then f is continuous at xo. a 


The next theorems are of fundamental importance in deriving methods for error esti- 
mation. The proofs of these theorems and the other unreferenced results in this section can 
be found in any standard calculus text. 

The set of all functions that have n continuous derivatives on X is denoted C”(X), and 
the set of functions that have derivatives of all orders on X is denoted C®(X). Polynomial, 
rational, trigonometric, exponential, and logarithmic functions are in C®(X), where X 
consists of all numbers for which the functions are defined. When X is an interval of the 
real line, we will again omit the parentheses in this notation. 


(Rolle’s Theorem) 
Suppose f € C[a,b] and f is differentiable on (a,b). If f(a) = f(b), then a number c in 
(a, b) exists with f’(c) = 0. (See Figure 1.3.) | 


S(a) = fib) 


(Mean Value Theorem) 
If f € C[a,b] and f is differentiable on (a, b), then a number c in (a, b) exists with (See 
Figure 1.4.) 


f(b) — f@ 


eC a ae 
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Figure 1.4 


Theorem 1.9 


Figure 1.5 


Research work on the design of 
algorithms and systems for 
performing symbolic 
mathematics began in the 1960s. 
The first system to be operational, 
in the 1970s, was a LISP-based 
system called MACSYMA. 


Example 1 
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Parallel lines 


Slope f"(c) 


_ O)~ fa) 
a 


1 
Slop ia 


(Extreme Value Theorem) 


If f € Cla, b], then cy, co € [a,b] exist with f(c;) < f@) < f(c2), for all x € [a, dD]. 
In addition, if f is differentiable on (a,b), then the numbers c; and c2 occur either at the 
endpoints of [a, b] or where f' is zero. (See Figure 1.5.) | 


As mentioned in the preface, we will use the computer algebra system Maple whenever 
appropriate. Computer algebra systems are particularly useful for symbolic differentiation 
and plotting graphs. Both techniques are illustrated in Example 1. 


Use Maple to find the absolute minimum and absolute maximum values of 
Ff (x) = 5 cos 2x — 2x sin 2x f (x) 
on the intervals (a) [1,2], and (b) [0.5, 1] 


Solution There is a choice of Text input or Math input under the Maple C 2D Math option. 
The Text input is used to document worksheets by adding standard text information in 
the document. The Math input option is used to execute Maple commands. Maple input 
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The Maple development project 
began at the University of 
Waterloo in late 1980. Its goal 
was to be accessible to 
researchers in mathematics, 
engineering, and science, but 
additionally to students for 
educational purposes. To be 
effective it needed to be portable, 
as well as space and time 
efficient. Demonstrations of the 
system were presented in 1982, 
and the major paper setting out 
the design criteria for the 
MAPLE system was presented in 
1983 [CGGG]. 


Mathematical Preliminaries and Error Analysis 


can either be typed or selected from the pallets at the left of the Maple screen. We will 
show the input as typed because it is easier to accurately describe the commands. For pallet 
input instructions you should consult the Maple tutorials. In our presentation, Maple input 
commands appear in italic type, and Maple responses appear in cyan type. 

To ensure that the variables we use have not been previously assigned, we first issue 
the command. 


restart 


to clear the Maple memory. We first illustrate the graphing capabilities of Maple. To access 
the graphing package, enter the command 


with(plots) 


to load the plots subpackage. Maple responds with a list of available commands in the 
package. This list can be suppressed by placing a colon after the with(plots) command. 
The following command defines f(x) = 5 cos 2x — 2x sin 2x as a function of x. 


f :=x — 5cos(2x) — 2x - sin(2x) 
and Maple responds with 
x — 5cos(2x) — 2x sin(2x) 
We can plot the graph of f on the interval [0.5, 2] with the command 
plot(f,0.5..2) 


Figure 1.6 shows the screen that results from this command after doing a mouse click on 
the graph. This click tells Maple to enter its graph mode, which presents options for various 
views of the graph. We can determine the coordinates of a point of the graph by moving the 
mouse cursor to the point. The coordinates appear in the box above the left of the plot(f, 
0.5 .. 2) command. This feature is useful for estimating the axis intercepts and extrema of 
functions. 

The absolute maximum and minimum values of f(x) on the interval [a, b] can occur 
only at the endpoints, or at a critical point. 


(a) When the interval is [1,2] we have 
fC) =Scos2 — 2 sin2 = —3.899329036 and f(2)=5cos4—4sin4= —0.241008123. 


A critical point occurs when f’(x) = 0. To use Maple to find this point, we first define a 
function fp to represent f’ with the command 


Jp =x > diff(f),*) 
and Maple responds with 
d 
x ae (x) 
To find the explicit representation of f’(x) we enter the command 


Ip) 


and Maple gives the derivative as 
—12sin(2x) — 4x cos(2x) 
To determine the critical point we use the command 


fsolve(fp(x), x, 1 .. 2) 
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Figure 1.6 


C:\NAM\na9ch01\na9_chi_current.mw* - [Server 1] - Maple 


Filo Edit View Insert Format Ty Drawing Plt Screaistect Toole Window Hep 
O88S% XBR OC TH EE ++ B1O0RS oF Maa = @ 


and Maple tells us that f’(x) = fp(x) = 0 for x in [1,2] when x is 
1.358229874 


We evaluate f(x) at this point with the command 


f(%) 


The % is interpreted as the last Maple response. The value of f at the critical point is 
—5.675301338 


As a consequence, the absolute maximum value of f(x) in [1,2] is f(2) = —0.241008123 
and the absolute minimum value is f (1.358229874) = —5.675301338, accurate at least to 
the places listed. 


(b) When the interval is [0.5, 1] we have the values at the endpoints given by 
f (0.5) =5 cos 1 — 1 sin 1 = 1.860040545 and f(1)=5cos2—2sin2 = — 3.899329036. 


However, when we attempt to determine the critical point in the interval [0.5, 1] with the 
command 


fsolve(fp(x), x,0.5 .. 1) 
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Theorem 1.10 


Theorem 1.11 


Figure 1.7 


Example 2 


Maple gives the response 
fsolve(—12 sin(2x) — 4x cos(2x),x,.5.. 1) 


This indicates that Maple is unable to determine the solution. The reason is obvious once 
the graph in Figure 1.6 is considered. The function f is always decreasing on this interval, 
so no solution exists. Be suspicious when Maple returns the same response it is given; it is 
as if it was questioning your request. 

In summary, on [0.5, 1] the absolute maximum value is f(0.5) = 1.86004545 and 
the absolute minimum value is f(1) = —3.899329036, accurate at least to the places 
listed. a 


The following theorem is not generally presented in a basic calculus course, but is 
derived by applying Rolle’s Theorem successively to f, f’,..., and, finally, to f~). 
This result is considered in Exercise 23. 


(Generalized Rolle’s Theorem) 


Suppose f € C[a,b] is n times differentiable on (a,b). If f(x) = 0 at the n + 1 distinct 
numbers a < x9 < xj <... < x, < b, then a number c in (x0,x,), and hence in (a,b), 
exists with f™(c) = 0. o 


We will also make frequent use of the Intermediate Value Theorem. Although its state- 
ment seems reasonable, its proof is beyond the scope of the usual calculus course. It can, 
however, be found in most analysis texts. 


(Intermediate Value Theorem) 


If f € Cla, b] and K is any number between f(a) and f(b), then there exists a number c 
in (a, b) for which f(c) = K. | 


Figure 1.7 shows one choice for the number that is guaranteed by the Intermediate 
Value Theorem. In this example there are two other possibilities. 


(a, f(a)) 
y = f(x) 


(b, f(b) 


Show that x° — 2x3 + 3x? — 1 = 0 has a solution in the interval [0, 1]. 


Solution Consider the function defined by f(x) = x° — 2x? + 3x? — 1. The function f is 
continuous on [0, 1]. In addition, 
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Definition 1.12 


George Fredrich Berhard 
Riemann (1826-1866) made 
many of the important 
discoveries classifying the 
functions that have integrals. He 
also did fundamental work in 
geometry and complex function 
theory, and is regarded as one of 
the profound mathematicians of 
the nineteenth century. 


Figure 1.8 
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f@)=-1<0 and 0<1= f(). 


The Intermediate Value Theorem implies that a number x exists, with 0 < x < 1, for which 
x — 2x3 + 3x? —-1=0. a 


As seen in Example 2, the Intermediate Value Theorem is used to determine when 
solutions to certain problems exist. It does not, however, give an efficient means for finding 
these solutions. This topic is considered in Chapter 2. 


Integration 


The other basic concept of calculus that will be used extensively is the Riemann integral. 


The Riemann integral of the function f on the interval [a,b] is the following limit, 
provided it exists: 


b n 
/ f@de= lim D> f@) Ax, 
a max Axj>0 4 
i=1 
where the numbers xo, x1,...,X, Satisfya = x9 < x1 < +++ < xX, = b, where Ax; = x;—x;-1, 
for each i = 1,2,...,n, and z; is arbitrarily chosen in the interval [x;-1,x;]. | 


A function f that is continuous on an interval [a,b] is also Riemann integrable on 
[a, b]. This permits us to choose, for computational convenience, the points x; to be equally 
spaced in [a, b], and for each i = 1,2,...,n, to choose z; = x;. In this case, 


n 


i .. b-a 
/ f(x) dx = lim —— 97 f(x), 


i=1 


where the numbers shown in Figure 1.8 as x; are x; = a+ i(b — a)/n. 


Two other results will be needed in our study of numerical analysis. The first is a 
generalization of the usual Mean Value Theorem for Integrals. 
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Theorem 1.13 


Figure 1.9 


Theorem 1.14 


Brook Taylor (1685-1731) 
described this series in 1715 in 
the paper Methodus 
incrementorum directa et inversa. 
Special cases of the result, and 
likely the result itself, had been 
previously known to Isaac 
Newton, James Gregory, and 
others. 
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(Weighted Mean Value Theorem for Integrals) 


Suppose f € C[a, b], the Riemann integral of g exists on [a, b], and g(x) does not change 
sign on [a, b]. Then there exists a number c in (a, b) with 


b ‘ 
/ f (x)g(x) dx = ro f g(x) dx. : 


When g(x) = 1, Theorem 1.13 is the usual Mean Value Theorem for Integrals. It gives 
the average value of the function f over the interval [a, b] as (See Figure 1.9.) 


1 b 
f= ~— / fds. 
=) 


The proof of Theorem 1.13 is not generally given in a basic calculus course but can be 
found in most analysis texts (see, for example, [Fu], p. 162). 


Taylor Polynomials and Series 
The final theorem in this review from calculus describes the Taylor polynomials. These 


polynomials are used extensively in numerical analysis. 


(Taylor's Theorem) 


Suppose f € C"[a,b], that f+” exists on [a,b], and xo € [a,b]. For every x € [a,b], 
there exists a number &(x) between xg and x with 


f(x) = P,(x) + R(x), 


where 
" (n) 
BO= fee? E=ws FOO apy escey LOO — x9)" 
=~. f® (x) 
- dX —,—& — x0)" 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Colin Maclaurin (1698-1746) is 
best known as the defender of the 
calculus of Newton when it came 
under bitter attack by the Irish 
philosopher, the Bishop George 
Berkeley. 


Maclaurin did not discover the 
series that bears his name; it was 
known to 17th century 
mathematicians before he was 
born. However, he did devise a 
method for solving a system of 
linear equations that is known as 
Cramer’s rule, which Cramer did 
not publish until 1750. 


Example 3 


Figure 1.10 
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and 


_ fF D EC) 


= n+l 
Ry(x) = at DI (x — xo)". | 


Here P,,(x) is called the nth Taylor polynomial for f about x9, and R,(x) is called 
the remainder term (or truncation error) associated with P,,(x). Since the number (x) 
in the truncation error R,,(x) depends on the value of x at which the polynomial P,,(x) is 
being evaluated, it is a function of the variable x. However, we should not expect to be 
able to explicitly determine the function & (x). Taylor’s Theorem simply ensures that such a 
function exists, and that its value lies between x and xo. In fact, one of the common problems 
in numerical methods is to try to determine a realistic bound for the value of f @™+D (E(x)) 
when x is in some specified interval. 

The infinite series obtained by taking the limit of P,,(x) as n — oo is called the Taylor 
series for f about xo. In the case xo = 0, the Taylor polynomial is often called a Maclaurin 
polynomial, and the Taylor series is often called a Maclaurin series. 

The term truncation error in the Taylor polynomial refers to the error involved in 
using a truncated, or finite, summation to approximate the sum of an infinite series. 


Let f(x) = cosx and x) = 0. Determine 
(a) the second Taylor polynomial for f about x9; and 
(b) the third Taylor polynomial for f about xo. 
Solution Since f € C™(R), Taylor’s Theorem can be applied for any n > 0. Also, 
f'(x) =—sinx, f"(x) =—cosx, f(x) =sinx, and f(x) =cosx, 
so 
f0) =1, f'O) =0, f"O) =-1, 


(a) Forn = 2 and x = 0, we have 


and f’”’(0) =0. 


cosx = f(0) + f'O)x+ rm 24 FEO) 


| 3! 


1 1 2 1 3. ( ) 
=1—--> nS siné 
xX xX X), 


where & (x) is some (generally unknown) number between 0 and x. (See Figure 1.10.) 
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When x = 0.01, this becomes 


—6 


1 1 10 
cos 0.01 = 1 — 5 (0.01)" + 5 0-0)" sin € (0.01) = 0.99995 + sin €(0.01). 


The approximation to cos 0.01 given by the Taylor polynomial is therefore 0.99995. The 
truncation error, or remainder term, associated with this approximation is 


6 
sin €(0.01) = 0.16 x 10~° sin (0.01), 


where the bar over the 6 in 0.16 is used to indicate that this digit repeats indefinitely. 
Although we have no way of determining sin (0.01), we know that all values of the sine 
lie in the interval [—1, 1], so the error occurring if we use the approximation 0.99995 for 
the value of cos 0.01 is bounded by 


| cos(0.01) — 0.99995| = 0.16 x 10~°| sin€(0.01)| < 0.16 x 107°. 
Hence the approximation 0.99995 matches at least the first five digits of cos 0.01, and 
0.9999483 < 0.99995 — 1.6 x 10° < cos0.01 
< 0.99995 + 1.6 x 10° < 0.9999517. 


The error bound is much larger than the actual error. This is due in part to the poor 
bound we used for | sin €(x)|. It is shown in Exercise 24 that for all values of x, we have 
| sinx| < |x|. Since 0 < € < 0.01, we could have used the fact that | sin €(x)| < 0.01 in the 
error formula, producing the bound 0.16 x 107°. 


(b) Since f’’(0) = 0, the third Taylor polynomial with remainder term about x) = 0 
is 


cosx =1—-- + — xX" cos A 
Xx Xx Xx Xx 


where 0 < &(x) < 0.01. The approximating polynomial remains the same, and the ap- 
proximation is still 0.99995, but we now have much better accuracy assurance. Since 
| cos &(x)| < 1 for all x, we have 


1 
< —(0.01)4(1) © 4.2 x 107". 
< 5, 0.01° x 10 


| J 48 cos E(x) 
24 
So 
| cos 0.01 — 0.99995| < 4.2 x 1071, 
and 
0.99994999958 = 0.99995 — 4.2 x 107'° 
< cos0.01 < 0.99995 + 4.2 x 107!° = 0.99995000042. | 


Example 3 illustrates the two objectives of numerical analysis: 
(i) Find an approximation to the solution of a given problem. 
(ii) Determine a bound for the accuracy of the approximation. 


The Taylor polynomials in both parts provide the same answer to (1), but the third Taylor 
polynomial gave a much better answer to (ii) than the second Taylor polynomial. 
We can also use the Taylor polynomials to give us approximations to integrals. 
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Illustration We can use the third Taylor polynomial and its remainder term found in Example 3 to 
approximate i cos x dx. We have 


0.1 0.1 1 1 0.1 : 
/ cosx de = [ tee) de x* cos E(x) dx 
6 4 2 24 Jo 


0.1 0.1 
=e + ea x* cos E(x) dx 
0 24 Jo 


01 Lom +a f- * cos E(x) d 
=U. 6 - A , xX COS Xx Xs 


Therefore 


0.1 1 a 
/ cosx dx © 0.1 — 50." = 0.09983. 
0 


A bound for the error in this approximation is determined from the integral of the Taylor 
remainder term and the fact that | cos €(x)| < 1 for all x: 


1 0.1 2 1 0.1 7 
— / x cos E(x) dx| < val x"| cos E(x)| dx 
24 | Jo 6 


24 

1 0.1 4 (0.1)° _ 
<= — — = 06. 1 8 
< af x" dx 120 8.3 x 10 


The true value of this integral is 


0.1 0.1 
/ cosx dx = sin | = sin0.1 © 0.099833416647, 
0 0 


so the actual error for this approximation is 8.3314 x 107°, which is within the error 
bound. 


We can also use Maple to obtain these results. Define f by 
f := cos(x) 


Maple allows us to place multiple statements on a line separated by either a semicolon or 
a colon. A semicolon will produce all the output, and a colon suppresses all but the final 
Maple response. For example, the third Taylor polynomial is given by 


s3 := taylor(f,x = 0,4): p3 := convert(s3, polynom) 


ts 
1 — =x* 

2 
The first statement s3:=taylor(f,x = 0,4) determines the Taylor polynomial about 
xo = 0 with four terms (degree 3) and an indication of its remainder. The second p3 := 
convert(s3, polynom) converts the series s3 to the polynomial p3 by dropping the remainder 
term. 

Maple normally displays 10 decimal digits for approximations. To instead obtain the 

11 digits we want for this illustration, enter 


Digits := 11 
and evaluate f (0.01) and P3(0.01) with 
yl := evalf(subs(x = 0.01, f)); y2 := evalf(subs(x = 0.01, p3) 
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This produces 


0.99995000042 
0.99995000000 


To show both the function (in black) and the polynomial (in cyan) near x9 = 0, we enter 


plot ((f,p3),x = —2..2) 
and obtain the Maple plot shown in Figure 1.11. 


Figure 1.11 


The integrals of f and the polynomial are given by 
qi := int(f,x =0..0.1); g2 := int(p3,x =0..0.1) 


0.0998334 16647 
0.099833333333 


We assigned the names g1 and q?2 to these values so that we could easily determine the error 
with the command 


err := |q1 — q2| 


8.3314 1078 


There is an alternate method for generating the Taylor polynomials within the Numer- 
icalAnalysis subpackage of Maple’s Student package. This subpackage will be discussed 
in Chapter 2. 


EXERCISE SET 1.1 


1. Show that the following equations have at least one solution in the given intervals. 
a. xcosx—2x?+3x—1=0, [0.2,0.3] and [1.2, 1.3] 
b («—2)?-—Inx=0, [1,2] and [e, 4] 
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c.  2xcos(2x) — (x —2)? =0, [2,3] and [3, 4] 
d. x—(Inx)*=0, [4,5] 
2. Find intervals containing solutions to the following equations. 
a x—-3~*=0 
b. 4x? -—e* =0 
ce x — 2x7 -4x4+2=0 
d. x? + 4.001x° + 4.002x + 1.101 = 0 
3. Show that f’(x) is 0 at least once in the given intervals. 
a f(x) =1—e + (e— 1)sin((z/2)x), [0,1] 
b = f@)=(@-— 1) tanx+xsinzx, [0,1] 
ce f(x) =xsinax—(x—-2)Inx, [1,2] 
d. f(x) = (—2)sinxIn@+2), [-1,3] 
4. Find max,<,<, | f (x)| for the following functions and intervals. 
a f(x) = (2—e* + 2x)/3, [0,1] 
b. f(x) = (4x —3)/@? — 2x), [0.5, 1] 
c. f(x) = 2xcos(2x) — (x— 2)", [2,4] 
da. f@=1te ee"), [1,2] 
5. Use the Intermediate Value Theorem 1.11 and Rolle’s Theorem 1.7 to show that the graph of 
f(x) = x° + 2x + k crosses the x-axis exactly once, regardless of the value of the constant k. 


6. Suppose f € C[a,b] and f’(x) exists on (a,b). Show that if f’(x) 4 0 for all x in (a, b), then there 
can exist at most one number p in [a, b] with f(p) = 0. 


7. Let f(xy) =x’. 
a. Find the second Taylor polynomial P3(x) about xp = 0. 
b. Find R,(0.5) and the actual error in using P,(0.5) to approximate f (0.5). 
c. Repeat part (a) using x9 = 1. 
d. Repeat part (b) using the polynomial from part (c). 
8. Find the third Taylor polynomial P3(x) for the function f(x) = /x + 1 about xo = 0. Approximate 
0.5, /0.75, V1.25, and V1.5 using P3(x), and find the actual errors. 
9. Find the second Taylor polynomial P(x) for the function f(x) = e* cos x about x) = 0. 


a. Use P2(0.5) to approximate f (0.5). Find an upper bound for error | f (0.5) — P2(0.5)| using the 
error formula, and compare it to the actual error. 


b. Find a bound for the error | f(x) — P2(x)| in using P2(x) to approximate f(x) on the interval 
(0, 1]. 
Approximate i f (x) dx using to P3(x) dx. 
d. Find an upper bound for the error in (c) using I, i" |Ro(x) dx|, and compare the bound to the actual 
error. 
10. Repeat Exercise 9 using x9 = 1/6. 
11. Find the third Taylor polynomial P3(x) for the function f(x) = (x — 1) Inx about xp = 1. 


a. Use P3(0.5) to approximate f (0.5). Find an upper bound for error | f (0.5) — P3(0.5)| using the 
error formula, and compare it to the actual error. 


b. Find a bound for the error | f(x) — P3(x)| in using P3(x) to approximate f(x) on the interval 
(0.5, 1.5]. 
Approximate ie f (x) dx using i P3(x) dx. 
d. ‘Find an upper bound for the error in (c) using fies |R3(x) dx|, and compare the bound to the 
actual error. 
12. Let f(x) = 2x cos(2x) — (x — 2)* and x) = 0. 
a. Find the third Taylor polynomial P3(x), and use it to approximate f (0.4). 


b. Use the error formula in Taylor’s Theorem to find an upper bound for the error | f (0.4) — P3(0.4)|. 
Compute the actual error. 
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c. Find the fourth Taylor polynomial P4(x), and use it to approximate f (0.4). 


d. Use the error formula in Taylor’s Theorem to find an upper bound for the error | f (0.4) — P4(0.4)|. 
Compute the actual error. 


13. Find the fourth Taylor polynomial P4(x) for the function f(x) = xe* about xy = 0. 
a. Find an upper bound for | f(x) — P4(x)|, for 0 < x < 0.4. 
b. Approximate ee Ff (x) dx using es P4(x) dx. 
c. Find an upper bound for the error in (b) using . P4(x) dx. 
d. Approximate f’(0.2) using P/,(0.2), and find the error. 
14. Use the error term of a Taylor polynomial to estimate the error involved in using sinx * x to 
approximate sin 1°. 
15. Use a Taylor polynomial about 7/4 to approximate cos 42° to an accuracy of 107°. 
16. Let f(x) = e*”” sin(x/3). Use Maple to determine the following. 
a. The third Maclaurin polynomial P3(x). 
b. f(x) and a bound for the error | f (x) — P3(x)| on [0, 1]. 
17. Let f(x) = In(x? + 2). Use Maple to determine the following. 
The Taylor polynomial P3(x) for f expanded about xp = 1. 
The maximum error | f (x) — P3(x)|, forO < x < 1. 
The Maclaurin polynomial P3(x) for Fe 
The maximum error | f (x) — P; (x)|, forO <x <1. 
Does P3(0) approximate f (0) better than P3(1) approximates f (1)? 


18. Let f(x) = (1 —x)~! and xp = 0. Find the nth Taylor polynomial P,,(x) for f(x) about x. Find a 
value of n necessary for P,,(x) to approximate f(x) to within 10~° on [0, 0.5]. 


19. Let f(x) = e* and xo = 0. Find the nth Taylor polynomial P,,(x) for f (x) about xo. Find a value of n 
necessary for P,,(x) to approximate f(x) to within 10~° on [0, 0.5]. 


i 


20. Find the nth Maclaurin polynomial P,,(x) for f(x) = arctan x. 


21. The polynomial P(x) = 1— ix? is to be used to approximate f(x) = cosx in [— i, i). Find a bound 
for the maximum error. 


22. Thenth Taylor polynomial for a function f at xp is sometimes referred to as the polynomial of degree 
at most n that “best” approximates f near xo. 


a. Explain why this description is accurate. 
b. Find the quadratic polynomial that best approximates a function f near x) = 1 if the tangent 
line at xo = 1 has equation y = 4x — 1, andif f”(1) = 6. 
23. Prove the Generalized Rolle’s Theorem, Theorem 1.10, by verifying the following. 
a. Use Rolle’s Theorem to show that f ; (z;) = 0 for n — 1 numbers in [a, b] with a < z, < m < 
~<Z_1 <b. 
b. Use Rolle’s Theorem to show that f (wi) = 0 for n — 2 numbers in [a, b] with z; < w; < 2 < 
W2+++Wn-2 < 2-1 <b. 
c. Continue the arguments in a. and b. to show that for each j = 1,2,...,n — 1 there aren —j 
distinct numbers in [a, b] where f is 0. 
d. Show that part c. implies the conclusion of the theorem. 
24. In Example 3 it is stated that for all x we have | sin x| < |x|. Use the following to verify this statement. 


a. Show that for all x > 0 we have f(x) = x —sinx is non-decreasing, which implies that sinx < x 
with equality only when x = 0. 


b. Use the fact that the sine function is odd to reach the conclusion. 


25. A Maclaurin polynomial for e* is used to give the approximation 2.5 to e. The error bound in this 
approximation is established to be EF = i. Find a bound for the error in E. 


26. The error function defined by 
2 Ps 
erf(x) = — | e dt 
Ji Jo 
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gives the probability that any one of a series of trials will lie within x units of the mean, assuming that 
the trials have a normal distribution with mean 0 and standard deviation J2/ 2. This integral cannot 
be evaluated in terms of elementary functions, so an approximating technique must be used. 


. . 2 
a. Integrate the Maclaurin series for e~* to show that 


(- 1)kx2k+! 


2 Co 
erf(x) = F dX a 


b. The error function can also be expressed in the form 


2 5 & Dk y2k+1 
erf(x) = —-e* —_—__—.. 
@) Weis Tas GED 
Verify that the two series agree for k = 1, 2, 3, and 4. [Hint: Use the Maclaurin series for ee] 
c. Use the series in part (a) to approximate erf(1) to within 1077. 
d. Use the same number of terms as in part (c) to approximate erf(1) with the series in part (b). 
e. Explain why difficulties occur using the series in part (b) to approximate erf(x). 
27. <A function f : [a,b] — R is said to satisfy a Lipschitz condition with Lipschitz constant L on [a, b] 
if, for every x, y € [a,b], we have | f(x) — f(y)| < L|x —y|. 
a. Show that if f satisfies a Lipschitz condition with Lipschitz constant L on an interval [a, b], then 
f €Cfa, b]. 
b. Show that if f has a derivative that is bounded on [a, b] by L, then f satisfies a Lipschitz condition 
with Lipschitz constant L on [a, b]. 


c. Give an example of a function that is continuous on a closed interval but does not satisfy a 
Lipschitz condition on the interval. 


28. Suppose f € C[a, b], that x; and x2 are in [a, b]. 


a. Show that a number & exists between x, and x) with 


f@i)+ fG2) 1 1 
{@=— ~=<sf(n)t+sf(n). 
2 2 2 
b. Suppose that c; and c, are positive constants. Show that a number & exists between x; and x, 
with 
f® = af) + Cof(%2) 
Cy + C2 


c. Give an example to show that the result in part b. does not necessarily hold when c and cz have 
opposite signs with c; A —cp. 
29. Let f € C[a,b], and let p be in the open interval (a, b). 
a. Suppose f(p) 4 0. Show that a 5 > 0 exists with f(x) 4 0, for all x in [p — 5,p + 4], with 
[p — 6,p + 6] a subset of [a, b]. 
b. Suppose f(p) = 0 and k > 0 is given. Show that a 5 > 0 exists with | f(x)| < k, for all x in 
[p — 6,p + 6], with [p — 6, + 4] a subset of [a, b]. 


| a 1.2 Round-off Errors and Computer Arithmetic 


The arithmetic performed by a calculator or computer is different from the arithmetic in 
algebra and calculus courses. You would likely expect that we always have as true statements 
things such as 2+2 = 4,4-8 = 32, and (/3)? = 3. However, with computer arithmetic we 
expect exact results for 2-+2 = 4 and 4-8 = 32, but we will not have precisely (/3)? = 3. 
To understand why this is true we must explore the world of finite-digit arithmetic. 
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In our traditional mathematical world we permit numbers with an infinite number of 
digits. The arithmetic we use in this world defines \/3 as that unique positive number that 
when multiplied by itself produces the integer 3. In the computational world, however, each 
representable number has only a fixed and finite number of digits. This means, for example, 
that only rational numbers—and not even all of these—can be represented exactly. Since 
3 is not rational, it is given an approximate representation, one whose square will not 
be precisely 3, although it will likely be sufficiently close to 3 to be acceptable in most 
situations. In most cases, then, this machine arithmetic is satisfactory and passes without 
notice or concern, but at times problems arise because of this discrepancy. 

Error due to rounding should be The error that is produced when a calculator or computer is used to perform real- 
expected whenever computations — number calculations is called round-off error. It occurs because the arithmetic per- 
are performed using numbers that formed in a machine involves numbers with only a finite number of digits, with the re- 
are not powers of 2. Keeping this syjt that calculations are performed with only approximate representations of the actual 
error under control is extremely »ymbers. In a computer, only a relatively small subset of the real number system is used 
for the representation of all the real numbers. This subset contains only rational numbers, 
both positive and negative, and stores the fractional part, together with an exponential 
part. 


important when the number of 
calculations is large. 


Binary Machine Numbers 


In 1985, the IEEE (Institute for Electrical and Electronic Engineers) published a report called 
Binary Floating Point Arithmetic Standard 754-1985. An updated version was published 
in 2008 as IEEE 754-2008. This provides standards for binary and decimal floating point 
numbers, formats for data interchange, algorithms for rounding arithmetic operations, and 
for the handling of exceptions. Formats are specified for single, double, and extended 
precisions, and these standards are generally followed by all microcomputer manufacturers 
using floating-point hardware. 

A 64-bit (binary digit) representation is used for a real number. The first bit is a sign 
indicator, denoted s. This is followed by an 11-bit exponent, c, called the characteristic, 
and a 52-bit binary fraction, f, called the mantissa. The base for the exponent is 2. 

Since 52 binary digits correspond to between 16 and 17 decimal digits, we can assume 
that a number represented in this system has at least 16 decimal digits of precision. The 
exponent of | 1 binary digits gives a range of 0 to 2'! — 1 = 2047. However, using only posi- 
tive integers for the exponent would not permit an adequate representation of numbers with 
small magnitude. To ensure that numbers with small magnitude are equally representable, 
1023 is subtracted from the characteristic, so the range of the exponent is actually from 
—1023 to 1024. 

To save storage and provide a unique representation for each floating-point number, a 
normalization is imposed. Using this system gives a floating-point number of the form 


(— 1b ames 0 ait fi). 


Illustration Consider the machine number 
0 10000000011 1011100100010000000000000000000000000000000000000000. 


The leftmost bit is s = 0, which indicates that the number is positive. The next 11 bits, 
10000000011, give the characteristic and are equivalent to the decimal number 


c=1-2! 40.29 4...40-27+1-2'41.29 = 10244241 = 1027. 
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The exponential part of the number is, therefore, 2!07”~!°73 — 2+. The final 52 bits specify 
that the mantissa is 


QQ O" 


As a consequence, this machine number precisely represents the decimal number 


rs re or 1 1 
~| sge—1023 1 =(=1 0. 71027-1023 1 
(=) d+ f) = (-1) PM ae ge as en ane 


= 27.56640625. 

However, the next smallest machine number is 

0 LOOOO00ON11T LOLILOOLOOOOLLLALILLL LLL LVLALILIALLLLTL1111111111111, 
and the next largest machine number is 

0 10000000011 10111001000 10000000000000000000000000000000000000001. 
This means that our original machine number represents not only 27.56640625, but also half 
of the real numbers that are between 27.56640625 and the next smallest machine number, 
as well as half the numbers between 27.56640625 and the next largest machine number. To 


be precise, it represents any real number in the interval 


[27.566406249999998223643 160599749535322 1893310546875, 
27.5664062500000017763568394002504646778 106689453125). 


The smallest normalized positive number that can be represented has s = 0, c = 1, 
and f = 0 and is equivalent to 


g- 1022. (1 +0) & 0.22251 x 107%", 
and the largest has s = 0, c = 2046, and f = 1 — 2~> and is equivalent to 

ee 9) 0.19977 e110. 
Numbers occurring in calculations that have a magnitude less than 

2710. (1 +0) 
result in underflow and are generally set to zero. Numbers greater than 
21023 . (2 — 2-52) 

result in overflow and typically cause the computations to stop (unless the program has 
been designed to detect this occurrence). Note that there are two representations for the 


number zero; a positive 0 when s = 0, c = 0 and f = O, and a negative 0 when s = 1, 
c=Oand f =0. 
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The error that results from 
replacing a number with its 
floating-point form is called 
round-off error regardless of 
whether the rounding or 
chopping method is used. 


Example 1 


The relative error is generally a 
better measure of accuracy than 
the absolute error because it takes 
into consideration the size of the 
number being approximated. 


Definition 1.15 


Mathematical Preliminaries and Error Analysis 


Decimal Machine Numbers 


The use of binary digits tends to conceal the computational difficulties that occur when a 
finite collection of machine numbers is used to represent all the real numbers. To examine 
these problems, we will use more familiar decimal numbers instead of binary representation. 
Specifically, we assume that machine numbers are represented in the normalized decimal 
floating-point form 


+0.d\do...dy x 10”, 


1<d,<9, and 0<d; <9, 


for each i = 2,...,k. Numbers of this form are called k-digit decimal machine numbers. 
Any positive real number within the numerical range of the machine can be normalized 
to the form 


y= 0.d)d> eae Ady dk+2 ... x 10". 


The floating-point form of y, denoted f/(y), is obtained by terminating the mantissa of 
y at k decimal digits. There are two common ways of performing this termination. One 
method, called chopping, is to simply chop off the digits dy41d,42.... This produces the 
floating-point form 


f(y) = 0.did...dy x 10". 


The other method, called rounding, adds 5 x 10"~“+ to y and then chops the result to 
obtain a number of the form 


filly) = 0.8152 ...5% X 10”. 


For rounding, when d;,; > 5, we add 1| to d; to obtain f/(y); that is, we round up. When 
dy41 <5, we simply chop off all but the first k digits; so we round down. If we round down, 
then 6; = d;, for each i = 1,2,...,k. However, if we round up, the digits (and even the 
exponent) might change. 


Determine the five-digit (a) chopping and (b) rounding values of the irrational number zr. 


Solution The number z has an infinite decimal expansion of the form 7 = 3.14159265.... 
Written in normalized decimal form, we have 


m = 0.314159265... x 10°. 
(a) The floating-point form of z using five-digit chopping is 


fl(t) = 0.31415 x 10! = 3.1415. 


(b) The sixth digit of the decimal expansion of z is a 9, so the floating-point form of 
m using five-digit rounding is 


f(t) = (0.31415 + 0.00001) x 10! = 3.1416. a 
The following definition describes two methods for measuring approximation errors. 


Suppose that p* is an approximation to p. The absolute error is |p — p*|, and the relative 


At 
error is pap provided that p # 0. a 


Consider the absolute and relative errors in representing p by p* in the following 
example. 
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Example 2 Determine the absolute and relative errors when approximating p by p* when 


We often cannot find an accurate 
value for the true error in an 
approximation. Instead we find a 
bound for the error, which gives 


us a “worst-case” error. 


Definition 1.16 


The term significant digits is 
often used to loosely describe the 
number of decimal digits that 
appear to be accurate. The 
definition is more precise, and 
provides a continuous concept. 


Table 1.1 


(a) p= 0.3000 x 10! and p* = 0.3100 x 10!; 
(b) p= 0.3000 x 107? and p* = 0.3100 x 10~°; 
(c) p= 0.3000 x 10+ and p* = 0.3100 x 107. 


Solution 


(a) For p = 0.3000 x 10! and p* = 0.3100 x 10! the absolute error is 0.1, and the 
relative error is 0.3333 x 107!. 


(b) For p = 0.3000 x 107? and p* = 0.3100 x 10~? the absolute error is 0.1 x 107+, 
and the relative error is 0.3333 x 107!. 


(c) For p = 0.3000 x 104 and p* = 0.3100 x 10*, the absolute error is 0.1 x 10°, and 
the relative error is again 0.3333 x 107. 


This example shows that the same relative error, 0.3333 x 107', occurs for widely varying 
absolute errors. As a measure of accuracy, the absolute error can be misleading and the 
relative error more meaningful, because the relative error takes into consideration the size 
of the value. a 


The following definition uses relative error to give a measure of significant digits of 
accuracy for an approximation. 


The number p* is said to approximate p to ¢ significant digits (or figures) if ¢ is the largest 
nonnegative integer for which 

Ip —P*| 
Ip| 


<5x 107. | 


Table 1.1 illustrates the continuous nature of significant digits by listing, for the various 
values of p, the least upper bound of |p — p*|, denoted max |p — p*|, when p* agrees with p 
to four significant digits. 


P 0.1 0.5 100 1000 5000 9990 10000 


max |p — p*| 0.00005 0.00025 0.05 0.5 2.5 4.995 2: 


Returning to the machine representation of numbers, we see that the floating-point 
representation f/(y) for the number y has the relative error 


pe 
=i 


If k decimal digits and chopping are used for the machine representation of 


y= 0.d\d>z are Ads ae 10”, 
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then 


y—fly)| _ |O.dydy...dydyyi... x 10” — O.dydy...dy x 10" 
y 7 O.d,dz... x 10" 


_ O.dxri1dky2... 10"-* 
~ 0.did>... x 10” 


_ | O.dx41dk+2 se 


x 10. 
O.d\d>... 


Since d; 4 0, the minimal value of the denominator is 0.1. The numerator is bounded above 
by 1. As a consequence, 


pe 


Ze. x 10°-* = 10-#*!, 
y l 


In a similar manner, a bound for the relative error when using k-digit rounding arithmetic 
is 0.5 x 10~-*+!. (See Exercise 24.) 

Note that the bounds for the relative error using k-digit arithmetic are independent of the 
number being represented. This result is due to the manner in which the machine numbers 
are distributed along the real line. Because of the exponential form of the characteristic, 
the same number of decimal machine numbers is used to represent each of the intervals 
[0.1, 1], [1, 10], and [10, 100]. In fact, within the limits of the machine, the number of 
decimal machine numbers in [10", 10”*'] is constant for all integers n. 


Finite-Digit Arithmetic 


In addition to inaccurate representation of numbers, the arithmetic performed in a computer 
is not exact. The arithmetic involves manipulating binary digits by various shifting, or 
logical, operations. Since the actual mechanics of these operations are not pertinent to this 
presentation, we shall devise our own approximation to computer arithmetic. Although our 
arithmetic will not give the exact picture, it suffices to explain the problems that occur. (For 
an explanation of the manipulations actually involved, the reader is urged to consult more 
technically oriented computer science texts, such as [Ma], Computer System Architecture.) 

Assume that the floating-point representations f/(x) and f/(y) are given for the real 
numbers x and y and that the symbols @, ©, ®, © represent machine addition, subtraction, 
multiplication, and division operations, respectively. We will assume a finite-digit arithmetic 
given by 


xOy= fi(fi) + flO), x@y= fUFI@ x flO)), 
xOy= fl(fl@) — flO), xOy= fi(fl@) + flO)). 


This arithmetic corresponds to performing exact arithmetic on the floating-point repre- 
sentations of x and y and then converting the exact result to its finite-digit floating-point 
representation. 

Rounding arithmetic is easily implemented in Maple. For example, the command 


Digits := 5 


causes all arithmetic to be rounded to 5 digits. To ensure that Maple uses approximate rather 
than exact arithmetic we use the evalf. For example, if x = 2 and y = V2 then 


evalf (x); evalf(y) 


produces 3.1416 and 1.4142, respectively. Then fl(fl(~) + f1Gy)) is performed using 
5-digit rounding arithmetic with 


evalf (evalf (x) + evalf (y)) 
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which gives 4.5558. Implementing finite-digit chopping arithmetic is more difficult and 
requires a sequence of steps or a procedure. Exercise 27 explores this problem. 


Example 3 Suppose that x = 2 and y = i. Use five-digit chopping for calculating x + y, x — y, x x y, 
and x + y. 


Solution Note that 
5 — 1 eS 
x= -=0.714285 and y= -=03 
7 3 
implies that the five-digit chopping values of x and y are 
fl(x) = 0.71428 x 10° and f(y) = 0.33333 x 10°. 
Thus 


x@y= fl(fl(x) + fl)) = fL (0.71428 x 10° + 0.33333 x 10°) 
= fi(1.04761 x 10°) = 0.10476 x 10°. 


The true value is x + y = 3 + 7 = Te so we have 
D2. 1 —4 

Absolute Error = a 0.10476 x 10°| = 0.190 x 10 

and 
0.190 x 10-4 _4 
Relative Error = | ——————_ | = 0.182 x 10™. 
22/21 
Table 1.2 lists the values of this and the other calculations. a 
Table 1.2 j ; 

Operation Result Actual value Absolute error Relative error 
x@Oy 0.10476 x 10! 22/21 0.190 x 10-4 0.182 x 107+ 
xOy 0.38095 x 10° 8/21 0.238 x 107° 0.625 x 10-> 
x®y 0.23809 x 10° 5/21 0.524 x 1075 0.220 x 10-4 
x@y 0.21428 x 10! 15/7 0.571 x 107+ 0.267 x 10-4 


The maximum relative error for the operations in Example 3 is 0.267 x 10~*, so the 
arithmetic produces satisfactory five-digit results. This is not the case in the following 
example. 


Example 4 Suppose that in addition to x = 2 and y = i we have 
u = 0.714251, v= 98765.9, and w=0.111111 x 1074, 
so that 
fl(u) = 0.71425 x 10°, fl(v) = 0.98765 x 10°, and f/(w) = 0.11111 x 107+. 


Determine the five-digit chopping values of x © u, (x GO u) @ w, (x Gu) @ v, andu ® v. 
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Solution These numbers were chosen to illustrate some problems that can arise with finite- 
digit arithmetic. Because x and u are nearly the same, their difference is small. The absolute 
error for x © u is 


| — u) — & Ou)| =|@ — wu) — (FIGFI@) — fl@))| 
= (3 - 0.714251 — (f1 (0.71428 x 10° — 0.71425 x 10°)) 


= |0.347143 x 10~* — f1 (0.00003 x 10°)| = 0.47143 x 10-°. 


This approximation has a small absolute error, but a large relative error 


0.47143 x 1075 
| Z < 0,136. 


0.347143 x 10-4 


The subsequent division by the small number w or multiplication by the large number v 
magnifies the absolute error without modifying the relative error. The addition of the large 
and small numbers u and v produces large absolute error but not large relative error. These 


calculations are shown in Table 1.3. a 
Operation Result Actual value Absolute error Relative error 
xOu 0.30000 x 10-4 0.34714 x 10-4 0.471 x 10-> 0.136 
(xQu)@w 0.27000 x 10! 0.31242 x 10! 0.424 0.136 
(xQu)@v 0.29629 x 10! 0.34285 x 10! 0.465 0.136 
u@®v 0.98765 x 10° 0.98766 x 10° 0.161 x 10! 0.163 x 10-4 


One of the most common error-producing calculations involves the cancelation of 
significant digits due to the subtraction of nearly equal numbers. Suppose two nearly equal 
numbers x and y, with x > y, have the k-digit representations 


f(x) = 0.di dz... dpdtp410tpz2... ae x 10", 
and 


fy) = 0.didy .. . dpBp+1Bp+2 «+» Be X 10". 


The floating-point form of x — y is 


FICFIG) — fICY)) = 0.6p 41042 -- 0% X 10", 


where 


0.0p410 p42 2+ On = 0.0p410p+2 ao - Ae — 0.Bp+1Bp+2 wane Bx. 


The floating-point number used to represent x — y has at most k — p digits of significance. 
However, in most calculation devices, x — y will be assigned k digits, with the last p being 
either zero or randomly assigned. Any further calculations involving x —y retain the problem 
of having only k — p digits of significance, since a chain of calculations is no more accurate 
than its weakest portion. 

If a finite-digit representation or calculation introduces an error, further enlargement of 
the error occurs when dividing by a number with small magnitude (or, equivalently, when 
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Example 5 


Illustration 


The roots x, and x, of a general 
quadratic equation are related to 
the coefficients by the fact that 


x +x = -— 
a 
and 
ie 
XX) = -. 
a 


This is a special case of Viéta’s 
Formulas for the coefficients of 
polynomials. 
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multiplying by a number with large magnitude). Suppose, for example, that the number z 
has the finite-digit approximation z + 6, where the error 6 is introduced by representation 
or by previous calculation. Now divide by ¢ = 10~”, where n > 0. Then 


o., FI@)\ _ 
os (Ze) = (z+ 8) x 10". 


The absolute error in this approximation, |6| x 10”, is the original absolute error, |5|, mul- 
tiplied by the factor 10”. 


Let p = 0.54617 and gq = 0.54601. Use four-digit arithmetic to approximate p — g and 
determine the absolute and relative errors using (a) rounding and (b) chopping. 


Solution The exact value of r = p — q is r = 0.00016. 


(a) Suppose the subtraction is performed using four-digit rounding arithmetic. Round- 
ing p and q to four digits gives p* = 0.5462 and g* = 0.5460, respectively, and 
r* = p* — q* = 0.0002 is the four-digit approximation to r. Since 


_r 00016 — 0.0002 
Ir=r*I _ 10.0016 = 0.0002) _ 9 45 
ir| (0.00016| 


the result has only one significant digit, whereas p* and q* were accurate to four 
and five significant digits, respectively. 


(b) If chopping is used to obtain the four digits, the four-digit approximations to p, q, 
and r are p* = 0.5461, g* = 0.5460, and r* = p* — q* = 0.0001. This gives 
Ir—r*| _ [0.00016 — 0.0001 


= = 0.375, 
ir| (0.00016] 


which also results in only one significant digit of accuracy. a 


The loss of accuracy due to round-off error can often be avoided by a reformulation of 
the calculations, as illustrated in the next example. 


The quadratic formula states that the roots of ax? + bx +c = 0, when a + 0, are 


—b+ Jb? — 4ac —b — J/b* — 4ac 


x) = ————__ and x= 


1.1 
2a 2a (1 


Consider this formula applied to the equation x7 + 62.10x + 1 = 0, whose roots are 
approximately 


x; = —0.01610723 and x2 = —62.08390. 


We will again use four-digit rounding arithmetic in the calculations to determine the root. In 
this equation, b? is much larger than 4ac, so the numerator in the calculation for x; involves 
the subtraction of nearly equal numbers. Because 


Vb? — 4ac = ¥ (62.10)? — (4.000) (1.000)(1.000) 
= /3856. — 4.000 = V3852. = 62.06, 


we have 


—62.10 + 62.06  —0.04000 
i= 2 = —0.02 
Flea) 2.000 2,000 0200 
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a poor approximation to x; = —0.01611, with the large relative error 
| — 0.01611 + 0.02000| 
| — 0.01611| 


On the other hand, the calculation for x2 involves the addition of the nearly equal numbers 
—b and —Vb? — 4ac. This presents no problem since 


—62.10— 62.06  —124.2 | 
2.000 ~ 2.000 


24x 107!. 


62.10 


flQx2) = 


has the small relative error 


— 62. 2.1 
eee eee 30x10 
| — 62.08| 


To obtain a more accurate four-digit rounding approximation for x,, we change the form of 
the quadratic formula by rationalizing the numerator: 


_ —b+ Vb? — 4ac b — Vb? — 4ac b* — (b? — 4ac) 
7 2a —b — JB — 4ac 
which simplifies to an alternate quadratic formula 

—2c 


~ Qa(—b — Vb? — 4ac)’ 


x) 


x, = —————_——.. (1.2) 
an + V/b? — 4ac 
Using (1.2) gives 
—2.000 —2.000 | 


= —0.01610, 


l — = 
FO) = S704 6206 ~ 1242 


which has the small relative error 6.2 x 107‘. 


The rationalization technique can also be applied to give the following alternative quadratic 
formula for x: 


—2c 
b— Jb — 4ac 
This is the form to use if b is a negative number. In the Illustration, however, the mistaken use 
of this formula for x2 would result in not only the subtraction of nearly equal numbers, but 


also the division by the small result of this subtraction. The inaccuracy that this combination 
produces, 


(1.3) 


x2 = 


—2c _ = 2.000 _ =2.000 _ gn 
b—Jb2—4ac  62.10—62.06 0.04000 ~— ”’ 


has the large relative error 1.9 x 1071. 


flQx2) = 


e The lesson: Think before you compute! 


Nested Arithmetic 


Accuracy loss due to round-off error can also be reduced by rearranging calculations, as 
shown in the next example. 


Evaluate f (x) = x? — 6.1x” + 3.2x + 1.5 at x = 4.71 using three-digit arithmetic. 


Solution Table 1.4 gives the intermediate results in the calculations. 
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Table 1.4 


Illustration 


Remember that chopping (or 
rounding) is performed after each 
calculation. 
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x x? 8 6.1x7 3.2x 
Exact 4.71 22.1841 104.487111 135.32301 15.072 
Three-digit (chopping) 4.71 22.1 104. 134. 15.0 
Three-digit (rounding) 4.71 22.2 105. 135. 15.1 


To illustrate the calculations, let us look at those involved with finding x? using three- 
digit rounding arithmetic. First we find 


x? = 4.71° = 22.1841 which rounds to 22.2. 
Then we use this value of x? to find 
xo =x? -x = 22.2-4.71 = 104.562 which rounds to 105. 

Also, 

6.1x” = 6.1(22.2) = 135.42 which rounds to 135, 
and 

3.2x = 3.2(4.71) = 15.072 which rounds to 15.1. 
The exact result of the evaluation is 

Exact: f (4.71) = 104.487111 — 135.32301 + 15.072 + 1.5 = —14.263899. 


Using finite-digit arithmetic, the way in which we add the results can effect the final result. 
Suppose that we add left to right. Then for chopping arithmetic we have 


Three-digit (chopping): (4.71) = ((104. — 134.) + 15.0) + 1.5 = —13.5, 
and for rounding arithmetic we have 
Three-digit (rounding): (4.71) = ((105. — 135.) + 15.1) + 1.5 = -13.4. 


(You should carefully verify these results to be sure that your notion of finite-digit arithmetic 
is correct.) Note that the three-digit chopping values simply retain the leading three digits, 
with no rounding involved, and differ significantly from the three-digit rounding values. 
The relative errors for the three-digit methods are 
—14.263899 + 13.5 —14.263899 + 13.4] 


Chopping: ~ 0.05, and Rounding: ~ 0.06. 
—14.263899 —14.263899 
a 


As an alternative approach, the polynomial f(x) in Example 6 can be written in a nested 
manner as 


f (x) =x? — 6.1x? + 3.2x + 15 = (x — 6.1)x + 3.2)x + 15. 
Using three-digit chopping arithmetic now produces 


F(471) = (4.71 — 6.4.71 + 3.24.71 + 1.5 = ((—1.39)(4.71) + 3.2)4.71 + 1.5 
= (6544324 70215 = (334451 £15 Se 159 4 15 S149, 
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Ina similar manner, we now obtain a three-digit rounding answer of — 14.3. The new relative 
errors are 


—14.263899 + 14.2 


Three-digit (chopping): = 0.0045; 
fee Suet tehaneine | —14.263899 | 
—14.263899 + 14.3 

Three-digit ding): ~ 0.0025. 

ree-digit (rounding) | —14.263899 | 0.0025 


Nesting has reduced the relative error for the chopping approximation to less than 10% 
of that obtained initially. For the rounding approximation the improvement has been even 
more dramatic; the error in this case has been reduced by more than 95%. 


Polynomials should always be expressed in nested form before performing an evalu- 
ation, because this form minimizes the number of arithmetic calculations. The decreased 
error in the Illustration is due to the reduction in computations from four multiplications 
and three additions to two multiplications and three additions. One way to reduce round-off 
error is to reduce the number of computations. 


EXERCISE SET 1.2 


1. Compute the absolute error and relative error in approximations of p by p*. 


a p=1,p* =22/7 b p=a,p* =3.1416 
ce p=e,p* =2.718 d. p=V2,p*=1.414 
e. p=el, p* = 22000 f. p= 10", p* = 1400 
g. p=8!, p* = 39900 h. p=9!,p* = V187(9/e)° 


2. Find the largest interval in which p* must lie to approximate p with relative error at most 10~+ for 
each value of p. 


a 71 b. e 
c /2 da. V7 

3. Suppose p* must approximate p with relative error at most 1073. Find the largest interval in which 
p* must lie for each value of p. 
a. 150 b. 900 
ce. 1500 d. 90 

4. Perform the following computations (i) exactly, (ii) using three-digit chopping arithmetic, and (iii) 
using three-digit rounding arithmetic. (iv) Compute the relative errors in parts (ii) and (iii). 


1 4 1 

a. 5373 b. 5°3 
ae ee 
3 Il 20 3 11 20 


5. Use three-digit rounding arithmetic to perform the following calculations. Compute the absolute error 
and relative error with the exact value determined to at least five digits. 


a. 133+0.921 b. 133 — 0.499 
ce. (121 — 0.327) — 119 d. (121 — 119) — 0.327 
13 6 
aes, f —107+6e—— 
2e—5.4 62 


h. 


ge 
o~ 
ols 
nt 
a 
~I| \o 
ee 
) 
-| | 
x8 


6. Repeat Exercise 5 using four-digit rounding arithmetic. 
7. Repeat Exercise 5 using three-digit chopping arithmetic. 


8. Repeat Exercise 5 using four-digit chopping arithmetic. 
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16. 


17. 


1.2 Round-off Errors and Computer Arithmetic 29 


The first three nonzero terms of the Maclaurin series for the arctangent function are x — (1/3)x* + 
(1/5)x°. Compute the absolute error and relative error in the following approximations of zr using the 
polynomial in place of the arctangent: 


1 1 
a 4 arctan (5) + arctan (=) 
b. = 16 arct : 4 arct ! 
is arctan 5 arctan 739 


The number e can be defined by e = ye (1 /nl), where n! = n(n—1)---2-1forn 40 and0! = 1. 
Compute the absolute error and relative error in the following approximations of e: 


5 1 10 1 
a. > af b. >, nt 


n=0 n=0 
Let 
XCOSX — sinx 
f(«) = ——__. 
x — sinx 
a. Find lim,_.o f(x). 
b. Use four-digit rounding arithmetic to evaluate f (0.1). 
c. Replace each trigonometric function with its third Maclaurin polynomial, and repeat part (b). 
d. The actual value is f(0.1) = —1.99899998. Find the relative error for the values obtained in 
parts (b) and (c). 
Let 
e’—e* 
f@= 
a. Find lim,_,o(e* — e*) /x. 
b. Use three-digit rounding arithmetic to evaluate f (0.1). 
c. Replace each exponential function with its third Maclaurin polynomial, and repeat part (b). 
d. The actual value is f(0.1) = 2.003335000. Find the relative error for the values obtained in 


parts (b) and (c). 
Use four-digit rounding arithmetic and the formulas (1.1), (1.2), and (1.3) to find the most accurate 
approximations to the roots of the following quadratic equations. Compute the absolute errors and 
relative errors. 


a ye 13 bg 

. 37 aq 6 

» 124 18, 1 Lg 

) Xb He SS 
3 4 6 


c. 1.002x? — 11.01x + 0.01265 = 0 
d.  1.002x? + 11.01x + 0.01265 = 0 
Repeat Exercise 13 using four-digit chopping arithmetic. 


Use the 64-bit long real format to find the decimal equivalent of the following floating-point machine 
numbers. 


a. 0 10000001010 1001001100000000000000000000000000000000000000000000 
b. 1 10000001010 1001001 100000000000000000000000000000000000000000000 
ce 0 01111111111 0101001100000000000000000000000000000000000000000000 
d. 0 01111111111 0101001100000000000000000000000000000000000000000001 


Find the next largest and smallest machine numbers in decimal form for the numbers given in Exer- 
cise 15. 


Suppose two points (xo, yo) and (x, y;) are on a straight line with y; ~¢ yo. Two formulas are available 
to find the x-intercept of the line: 


Xoyi — X1Yo (x1 — X0)Yo 
x= ——— and x =x — ————. 
y1 — Yo y1 — Yo 
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a. Show that both formulas are algebraically correct. 


b. Use the data (x, yo) = (1.31, 3.24) and (4, y,) = (1.93, 4.76) and three-digit rounding arith- 


metic to compute the x-intercept both ways. Which method is better and why? 
n 


The Taylor polynomial of degree n for f (x) = e* is }“7_9(x'/i!). Use the Taylor polynomial of degree 
nine and three-digit chopping arithmetic to find an approximation to e~> by each of the following 
methods. 


c. An approximate value of e~> correct to three digits is 6.74 x 10-3. Which formula, (a) or (b), 
gives the most accuracy, and why? 


The two-by-two linear system 
ax + by =e, 
cx+dy= f, 


where a, b, c, d, e, f are given, can be solved for x and y as follows: 


set m = ©, provided a 4 0; 
a 
d, =d—mb; 
f= — me, 
if. 
ah 
(e — by) 
~ a 


Solve the following linear systems using four-digit rounding arithmetic. 


a. 1.130x — 6.990y = 14.20 b. 8.110x + 12.20y = —0.1370 
1.013x — 6.099y = 14.22 —18.11x + 112.2y = —0.1376 
Repeat Exercise 19 using four-digit chopping arithmetic. 
a. Show that the polynomial nesting technique described in Example 6 can also be applied to the 


evaluation of 


f(x) = 1.0le* — 4.62e* — 3.11e** + 12.2e* — 1.99. 


b. Use three-digit rounding arithmetic, the assumption that e!>? = 4.62, and the fact that e* = (e")” 
to evaluate f (1.53) as given in part (a). 

c. Redo the calculation in part (b) by first nesting the calculations. 

d. Compare the approximations in parts (b) and (c) to the true three-digit result f (1.53) = —7.61. 

A rectangular parallelepiped has sides of length 3 cm, 4 cm, and 5 cm, measured to the nearest 

centimeter. What are the best upper and lower bounds for the volume of this parallelepiped? What 

are the best upper and lower bounds for the surface area? 

Let P,,(x) be the Maclaurin polynomial of degree n for the arctangent function. Use Maple carrying 

75 decimal digits to find the value of n required to approximate 7 to within 10~*° using the following 

formulas. 


wale (S)+e.(2)] » 167 (5) ~4” (35) 
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24. Suppose that f/() is a k-digit rounding approximation to y. Show that 
pee 

y 

[Hint: If dyy1 <5, then fl(y) = 0.d;dz...d, x 10". If dyy, = 5, then fl(y) = 0.d,d)...d, x 10" + 


10"-*,] 
m\ _ m! 
kk) k!(m—b)! 


25. The binomial coefficient 
describes the number of ways of choosing a subset of k objects from a set of m elements. 


< 0.5 x 107**!, 


a. Suppose decimal machine numbers are of the form 


+0.d;dxd3d, x 10", with <d, <9,O0<d, <9, ifi=2,3,4 and |n| < 15. 


m 


What is the largest value of m for which the binomial coefficient ( : 


by the definition without causing overflow? 


) can be computed for all k 
b. Show that (7) can also be computed by 

m = (") m—1 m—k+1 

kk} \k/\k=-1 1 


m 
3 


c. What is the largest value of m for which the binomial coefficient ( 
formula in part (b) without causing overflow? 


) can be computed by the 


d. Use the equation in (b) and four-digit chopping arithmetic to compute the number of possible 
5-card hands in a 52-card deck. Compute the actual and relative errors. 
26. Let f € C[a,b] be a function whose derivative exists on (a,b). Suppose f is to be evaluated at xp 
in (a, b), but instead of computing the actual value f (xo), the approximate value, f (xo), 18 the actual 
value of f at xo + ¢, that is, f (Xo) = fo+t+e). 


a. Use the Mean Value Theorem 1.8 to estimate the absolute error | f (xo) — f (xo)| and the relative 


error | f (x0) — f (xo)|/| f @o)|, assuming f (xo) 4 0. 
b. Ife =5 x 10~° and x9 = 1, find bounds for the absolute and relative errors for 
i fMm=e 
ii. f(x) =sinx 
c. Repeat part (b) with e = (5 x 10~°%)xp and xp = 10. 
27. The following Maple procedure chops a floating-point number x to ¢ digits. (Use the Shift and Enter 
keys at the end of each line when creating the procedure.) 
chop := proc(x, t); 
local e, x2; 
ifx = 0 then0 
else 
e := ceil (evalf (log10(abs(x)))); 
x2 := evalf (trunc (x - 10°-°) - 10°); 
end if 
end; 


Verify the procedure works for the following values. 


a. x= 124.031, r=5 b. x = 124.036, t=5 
ce x= —124.031,r=5 d. x= —124.036, t=5 
e. x= 0.00653, t= 2 f. x= 0.00656, t = 2 
g. x = —0.00653, t = 2 h. x = —0.00656, t = 2 
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28. The opening example to this chapter described a physical experiment involving the temperature of a 
gas under pressure. In this application, we were given P = 1.00 atm, V = 0.100 m, N = 0.00420 mol, 
and R = 0.08206. Solving for T in the ideal gas law gives 


P 4 ; 
T= Z = ne) = 290.15 K = 17°C. 
NR (0.00420) (0.08206) 


In the laboratory, it was found that T was 15°C under these conditions, and when the pressure was 
doubled and the volume halved, T was 19°C. Assume that the data are rounded values accurate to the 
places given, and show that both laboratory figures are within the bounds of accuracy for the ideal 
gas law. 


| ay 1.3 Algorithms and Convergence 


Throughout the text we will be examining approximation procedures, called algorithms, 
involving sequences of calculations. An algorithm is a procedure that describes, in an 
unambiguous manner, a finite sequence of steps to be performed in a specified order. The 
object of the algorithm is to implement a procedure to solve a problem or approximate a 
solution to the problem. 

We use a pseudocode to describe the algorithms. This pseudocode specifies the form 
of the input to be supplied and the form of the desired output. Not all numerical procedures 


The use of an algorithm is as old 
as formal mathematics, but the 
name derives from the Arabic 


nathematician Muhanmad give satisfactory output for arbitrarily chosen input. As a consequence, a stopping technique 
ibndsa al: Khwaracigmt independent of the numerical technique is incorporated into each algorithm to avoid infinite 
(c. 780-850). The Latin loops. 

translation of his works begins Two punctuation symbols are used in the algorithms: 


with the words “Dixit Algorismi” 


meaning “al-Khwararizmi says." 4 period (.) indicates the termination of a step, 


© asemicolon (;) separates tasks within a step. 


Indentation is used to indicate that groups of statements are to be treated as a single entity. 
Looping techniques in the algorithms are either counter-controlled, such as, 


For i=1,2,...,n 
Set xj; =a+i-h 
or condition-controlled, such as 
While i < N do Steps 3-6. 
To allow for conditional execution, we use the standard 
If... then or If... then 
else 


constructions. 

The steps in the algorithms follow the rules of structured program construction. They 
have been arranged so that there should be minimal difficulty translating pseudocode into 
any programming language suitable for scientific applications. 

The algorithms are liberally laced with comments. These are written in italics and 
contained within parentheses to distinguish them from the algorithmic statements. 
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N 
Illustration The following algorithm computes x; + x2 +---+2%4y = os x;, given N and the numbers 


X1,X2,..-,XN- = 
INPUT N,x1,%0,...,Xn- 

OUTPUT SUM =~, x;. 

Step 1 SetSUM=0. (Initialize accumulator.) 


Step 2 Fori=1,2,...,N do 
set SUM = SUM + x;._ ( Add the next term.) 


Step 3 OUTPUT (SUM); 
STOP. 


Example 1 The Nth Taylor polynomial for f(x) = Inx expanded about xo = | is 
N 


=| i+] ; 
Pro = oS =v 


i 


i=1 


and the value of In 1.5 to eight decimal places is 0.40546511. Construct an algorithm to 
determine the minimal value of N required for 


| In 1.5 — Py(1.5)| < 107°, 


without using the Taylor polynomial remainder term. 


lo) 
n=1 


terms decrease in magnitude, then A and the Mth partial sum Ay = ys dy differ by less 
than the magnitude of the (NV + 1)st term; that is, 


Solution From calculus we know that if }°>” , a, is an alternating series with limit A whose 


|A — Ay| < lav+il- 
The following algorithm uses this bound. 


INPUT value x, tolerance TOL, maximum number of iterations M. 
OUTPUT degree N of the polynomial or a message of failure. 
Step 7 SetN=1; 

y=x-1; 

SUM = 0; 

POWER = y; 

TERM = y; 

SIGN = —1. (Used to implement alternation of signs.) 


Step 2. While N < M do Steps 3-5. 


Step 3 Set SIGN =—SIGN; (Alternate the signs.) 
SUM = SUM + SIGN: TERM; (Accumulate the terms.) 
POWER = POWER - y; 
TERM = POWER/(N + 1). (Calculate the next term.) 
Step 4 If|TERM|<TOLthen (Test for accuracy.) 
OUTPUT (N); 
STOP. (The procedure was successful.) 


Step5 SetN=N-+1. (Prepare for the next iteration.) 
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Step 6 OUTPUT (‘Method Failed’); (The procedure was unsuccessful.) 
STOP. 


The input for our problem is x = 1.5, TOL = 10~°, and perhaps M = 15. This choice 
of M provides an upper bound for the number of calculations we are willing to perform, 
recognizing that the algorithm is likely to fail if this bound is exceeded. Whether the output 
is a value for N or the failure message depends on the precision of the computational 
device. a 


Characterizing Algorithms 


We will be considering a variety of approximation problems throughout the text, and in each 
case we need to determine approximation methods that produce dependably accurate results 
for a wide class of problems. Because of the differing ways in which the approximation 
methods are derived, we need a variety of conditions to categorize their accuracy. Not all 
of these conditions will be appropriate for any particular problem. 

One criterion we will impose on an algorithm whenever possible is that small changes 
in the initial data produce correspondingly small changes in the final results. An algorithm 
that satisfies this property is called stable; otherwise it is unstable. Some algorithms are 
stable only for certain choices of initial data, and are called conditionally stable. We will 
characterize the stability properties of algorithms whenever possible. 
indicates that a small change in To further consider the subject of round-off error growth and its connection to algorithm 
initial data or conditions does not Stability, suppose an error with magnitude Ey > O is introduced at some stage in the 
result in a dramatic change inthe | Calculations and that the magnitude of the error after n subsequent operations is denoted by 
solution to the problem. E,,. The two cases that arise most often in practice are defined as follows. 


The word stable has the same 
root as the words stand and 
standard. In mathematics, the 
term stable applied to a problem 


Definition 1.17 Suppose that Ey > O denotes an error introduced at some stage in the calculations and E,, 
represents the magnitude of the error after n subsequent operations. 


e If E, ~ CnEp, where C is a constant independent of n, then the growth of error is 
said to be linear. 


e If E, © C"Eo, for some C > 1, then the growth of error is called exponential. sm 


Linear growth of error is usually unavoidable, and when C and Ep are small the results 
are generally acceptable. Exponential growth of error should be avoided, because the term C” 
becomes large for even relatively small values of n. This leads to unacceptable inaccuracies, 
regardless of the size of Eg. As a consequence, an algorithm that exhibits linear growth of 
error is stable, whereas an algorithm exhibiting exponential error growth is unstable. (See 
Figure 1.12.) 


Illustration For any constants c; and c), 


is a solution to the recursive equation 


10 
—Pn-1 —Pn-2, forn=2,3,.... 


Pra = 3 
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Figure 1.12 


Unstable exponential error growth 
e E,=C"E 


e Stable linear error growth 
E,, = CnEg 


10 10 ie ie i 
“g Pa-l — Pr-2 = 3 c (;) + oa] = c (5 + oa] 
7 Py ed ' sia sa 
= C1 3 oe +c 3 
=, 1 ite 1 + 3”-29) a= 1 i + 3” ss 
= C¢ 3 9 (op) =C| 3 C29 = Pn- 


Suppose that we are given pp = | and py = i This determines unique values for the 
constants as c; = 1 and cp = 0. Sop, = (4)" for all n. 

If five-digit rounding arithmetic is used to compute the terms of the sequence given by 
this equation, then pp = 1.0000 and p; = 0.33333, which requires modifying the constants 
to ¢; = 1.0000 and ¢, = —0.12500 x 107°. The sequence {P,}°°) generated is then given 
by 


1 n 
Pn = 1.0000 (;) — 0.12500 x 1075(3)", 


which has round-off error, 
Pn — Pn = 0.12500 x 1079(3"), 


This procedure is unstable because the error grows exponentially with n, which is reflected 
in the extreme inaccuracies after the first few terms, as shown in Table 1.5 on page 36. 


Now consider this recursive equation: 
Pn = 2Pn-1 —Pn-2, forn = 2,3,.... 
It has the solution p, = c; + con for any constants c; and c2, because 
2Pn—1 — Pn-2 = 2(c, + c2(n — 1)) — (c1 +O2(n — 2)) 
=c1(2—-— 1) + on -2 -—n+2) =cy + 02n = Pp. 
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Table 1.5 
n Computed p, Correct Pn Relative Error 
0 0.10000 x 10! 0.10000 x 10! 
1 0.33333 x 10° 0.33333 x 10° 
2 0.11110 x 10° 0.11111 x 10° 9x 10-5 
3 0.37000 x 107! 0.37037 x 107! 1x 10-3 
4 0.12230 x 107! 0.12346 x 107! 9 x 10-3 
5 0.37660 x 10-7 0.41152 x 10-7 8 x 10°? 
6 0.32300 x 10-3 0.13717 x 10-7 8 x 107! 
7 —0.26893 x 10-7 0.45725 x 1077 7 x 10° 
8 —0.92872 x 10-7 0.15242 x 1077 6 x 10! 
If we are given po = 1 and p, = ie then constants in this equation are uniquely determined 
to be c; = 1 and c) = —$. This implies that p, = 1 — $n. 
If five-digit rounding arithmetic is used to compute the terms of the sequence given by this 
equation, then pp = 1.0000 and p; = 0.33333. As a consequence, the five-digit rounding 
constants are C; = 1.0000 and c, = —0.66667. Thus 
Pn = 1.0000 — 0.66667n, 
which has round-off error 
. 2 

Pn — Pn = { 0.66667 — 3 n. 
This procedure is stable because the error grows grows linearly with n, which is reflected 
in the approximations shown in Table 1.6. 

Table 1.6 . : 

n Computed p,, Correct p, Relative Error 
0 0.10000 x 10! 0.10000 x 10 
1 0.33333 x 10° 0.33333 x 10° 
2 —0.33330 x 10° —0.33333 x 10° 9x 10> 
3 —0.10000 x 10! —0.10000 x 10 0 
4 —0.16667 x 10! —0.16667 x 10 0 
5 —0.23334 x 10! —0.23333 x 10 4x 10> 
6 —0.30000 x 10! —0.30000 x 10 0 
7 —0.36667 x 10! —0.36667 x 10 0 
8 —0.43334 x 10! —0.43333 x 10 2x 10-5 


The effects of round-off error can be reduced by using high-order-digit arithmetic such 
as the double- or multiple-precision option available on most computers. Disadvantages in 
using double-precision arithmetic are that it takes more computation time and the growth 
of round-off error is not entirely eliminated. 

One approach to estimating round-off error is to use interval arithmetic (that is, to 
retain the largest and smallest possible values at each step), so that, in the end, we obtain 
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Example 2 


Table 1.7 


There are numerous other ways 
of describing the growth of 
sequences and functions, some of 
which require bounds both above 
and below the sequence or 
function under consideration. 
Any good book that analyzes 
algorithms, for example [CLRS], 
will include this information. 
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an interval that contains the true value. Unfortunately, a very small interval may be needed 
for reasonable implementation. 


Rates of Convergence 


Since iterative techniques involving sequences are often used, this section concludes with a 
brief discussion of some terminology used to describe the rate at which convergence occurs. 
In general, we would like the technique to converge as rapidly as possible. The following 
definition is used to compare the convergence rates of sequences. 


[o.e) 1 o.e) 
Suppose {f,,}°°, is a sequence known to converge to zero, and {a@,}"°, converges to a 
number q@. If a positive constant K exists with 


la, —a| < K|6,|, for large n, 


then we say that {a@,}°° , converges to a with rate, or order, of convergence O(f,,). (This 
expression is read “big oh of £,,”.) It is indicated by writing a, = a + O(B,). a 


Although Definition 1.18 permits {a,}°°, to be compared with an arbitrary sequence 
{Bn}f21, in nearly every situation we use 


1 


Bn = nP? 


for some number p > 0. We are generally interested in the largest value of p with a, = 
a+ O(1/n?). 


Suppose that, for n > 1, 


n+1 a n+3 
and a, = ae 


an = 

n 

Both limys.0 @, = 0 and limy+o @, = 0, but the sequence {@,} converges to this limit 

much faster than the sequence {a,,}. Using five-digit rounding arithmetic we have the values 
shown in Table 1.7. Determine rates of convergence for these two sequences. 


n 1 2 3 4 5 6 7 


An 2.00000 0.75000 0.44444 0.31250 0.24000 0.19444 0.16327 
On 4.00000 0.62500 0.22222 0.10938 0.064000 0.041667 0.029155 


Solution Define the sequences 6, = 1/n and By = 1/n?. Then 


1 1 
i lS eT ee ae, 
n n n 
and 
is | 3 1 zs 
a Se ay, 
n nN 
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Hence the rate of convergence of {a} to zero is similar to the convergence of {1/n} to zero, 
whereas {@,} converges to zero at a rate similar to the more rapidly convergent sequence 
{1/n7}. We express this by writing 


1 1 
a, =0+0(") and é, =0+0(-5). | 
n n 


We also use the O (big oh) notation to describe the rate at which functions converge. 


Suppose that lim,.9 G(h) = 0 and lim;_.9 F (4) = L. If a positive constant K exists with 
|F(h) —L| < K|G(A)|, _ for sufficiently small h, 


then we write F(h) = L + O(G(h)). | 


The functions we use for comparison generally have the form G(h) = h?, where p > 0. 
We are interested in the largest value of p for which F(h) = L + O(h?). 
1 
Use the third Taylor polynomial about h = 0 to show that cosh + 5h =1+0(h'). 


Solution In Example 3(b) of Section 1.1 we found that this polynomial is 


h=1 rs : h* cos &(h) 
cosh = = —— cos 
2 24 ; 


for some number E(h) between zero and h. This implies that 


he ie ph eost ch) 
cos —h = —h' cos ; 
2 24 

Hence 


ht < tia 
~ 24 


72 |) 4 S| one) 
COs = = = |}— COS 
2 24 


so ash > 0, cosh + sh converges to its limit, 1, about as fast as h* converges to 0. That 
18, 


oe 4 
cosh + af =1+0(h’). a 


Maple uses the O notation to indicate the form of the error in Taylor polynomials and 
in other situations. For example, at the end of Section 1.1 the third Taylor polynomial for 
J (x) = cos(x) was found by first defining 


f :=cos(x) 

and then calling the third Taylor polynomial with 
taylor( f ,x = 0,4) 

Maple responds with 


1 , 
j= a + O(x*) 


to indicate that the lowest term in the truncation error is x*. 
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EXERCISE SET 1.3 


1 


1. a. Use three-digit chopping arithmetic to compute the sum =. (1/77) first by + + ; apse Pang 


and then by a + x tere t+ i, Which method is more accurate, and why? 
b. Write an algorithm to sum the finite series ~"”, x; in reverse order. 
2. The number e is defined by e = ye (/n), where n! = n(n — 1)---2-1 forn 4 0 and 0! = 1. 
Use four-digit chopping arithmetic to compute the following approximations to e, and determine the 
absolute and relative errors. 


=| re | 

a en 7 b. en Ss 7 
nerd PE (5-/)! 
0 4 10 1 

Cc er 4 d. ew 7 
n=0 Ht j=0 (10 D) 


3. The Maclaurin series for the arctangent function converges for —1 < x < 1 and is given by 


n Qin 
arctanx = lim P,,(x) = lim =). 
~ noo (x) noo XL ) 2i-1 


a. Use the fact that tan /4 = 1 to determine the number of n terms of the series that need to be 
summed to ensure that |4P,,(1) — z| < 1073. 


b. The C++ programming language requires the value of 2 to be within 10~!°. How many terms 
of the series would we need to sum to obtain this degree of accuracy? 


4. Exercise 3 details a rather inefficient means of obtaining an approximation to 2. The method can 
be improved substantially by observing that 7/4 = arctan 5 + arctan i and evaluating the series 
for the arctangent at 5 and at i. Determine the number of terms that must be summed to ensure an 


approximation to z to within 1073. 


5. Another formula for computing 2 can be deduced from the identity 7/4 = 4 arctan t — arctan x: 
Determine the number of terms that must be summed to ensure an approximation to z to within 107°. 


6. Find the rates of convergence of the following sequences as n > oo. 


1 1 
a. lim sin- = 0 b. lim sin =0 
n—>oo n n>oo n 
i = = 
ce. lim (sin *) =0 sd eae a en = 
noo n 
7. Find the rates of convergence of the following functions as h > 0. 
. sinh . l—-cosh 
a. lim =1 b. lim ———— =0 
h>0 h h>0 h 
. sinh—hcosh l—et 
ec. lim ———_——— =0 d. lim =-—|] 
h>0 h n>0 h 


8. a. How many multiplications and additions are required to determine a sum of the form 


n i 


> a;b;? 
1 


i=l j= 


b. Modify the sum in part (a) to an equivalent form that reduces the number of computations. 
9, Let P(x) = a,x" + dy_x" | +--+ + ayx +a bea polynomial, and let x) be given. Construct an 
algorithm to evaluate P(x) using nested multiplication. 

10. Equations (1.2) and (1.3) in Section 1.2 give alternative formulas for the roots x, and x, of 
ax? + bx + c = O. Construct an algorithm with input a,b,c and output x,, x2 that computes 
the roots x; and x2 (which may be equal or be complex conjugates) using the best formula for each 
root. 

11. Construct an algorithm that has as input an integer n > 1, numbers Xo, X),...,X,, and a number x and 
that produces as output the product (x — x9)(x — x1) +--+ (* — X). 
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Assume that 


1—2x cy 2x — 4x3 + 4x3 — 8x7 _ 1+ 2x 
l—x+x? 1—x?+4x4 1—x44+x8 ~ L4x 422’ 


for x < 1, and let x = 0.25. Write and execute an algorithm that determines the number of terms 
needed on the left side of the equation so that the left side differs from the right side by less than 10~°. 


a. Suppose that 0 < g < pand thata, =a+O (n-”). Show that a, =a+0O (n-“). 


b. Make a table listing 1/n, 1/n?, 1/n*, and 1/n* for n = 5,10, 100, and 1000, and discuss the 
varying rates of convergence of these sequences as n becomes large. 


a. Suppose that 0 < g < pand that F(h) = L + O (h’). Show that F(h) = L + O (h‘). 


b. Make a table listing h, h?, h?, and h* for h = 0.5,0.1,0.01, and 0.001, and discuss the varying 
rates of convergence of these powers of h as h approaches zero. 


Suppose that as x approaches zero, 
Fi(x) =L,;+ 00%) and Fy(x) = L, + O(**). 
Let c, and c, be nonzero constants, and define 
F(x) =c,F\(x) + cF2(x) and 
G(x) = Fi(eix) + Fa(cox). 


Show that if y = minimum {qa, §}, then as x approaches zero, 

a F(x) = cL) + cl. + OX”) 

b. G(x) = Ly + Ty + O(x”). 

The sequence {F,,} described by Fp = 1, Fy = 1, and Fry2 = Fyt+Fryi,ifn > 0, is calleda Fibonacci 
sequence. Its terms occur naturally in many botanical species, particularly those with petals or scales 
arranged in the form of a logarithmic spiral. Consider the sequence {x,}, where x, = Froi/Fn- 


Assuming that lim,—.o0 X, = x exists, show that x = (1 + 5) /2. This number is called the golden 
ratio. 


The Fibonacci sequence also satisfies the equation 


a. Li fieys) fins), 
cored l?)-(2)] 


Write a Maple procedure to calculate Fjo9. 


Use Maple with the default value of Digits followed by evaif to calculate F0. 
Why is the result from part (a) more accurate than the result from part (b)? 


Why is the result from part (b) obtained more rapidly than the result from part (a)? 


i 


What results when you use the command simplify instead of evalf to compute F'199? 

The harmonic series | + 5 + : + 4 +.--- diverges, but the sequence y, = 1 + 5 tere t 1 —Inn 

converges, since {y,,} is a bounded, nonincreasing sequence. The limit y = 0.5772156649. .. of the 

sequence {y,} is called Euler’s constant. 

a. Use the default value of Digits in Maple to determine the value of n for y, to be within 
10-? of y. 

b. Use the default value of Digits in Maple to determine the value of n for y, to be within 
107? of y. 

c. What happens if you use the default value of Digits in Maple to determine the value of n for y, 

to be within 10~4 of y? 
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| 1.4 Numerical Software 


Computer software packages for approximating the numerical solutions to problems are 
available in many forms. On our web site for the book 


http://www.math.ysu.edu/~faires/Numerical-Analysis/Programs.html 


we have provided programs written in C, FORTRAN, Maple, Mathematica, MATLAB, 
and Pascal, as well as JAVA applets. These can be used to solve the problems given in the 
examples and exercises, and will give satisfactory results for most problems that you may 
need to solve. However, they are what we call special-purpose programs. We use this term 
to distinguish these programs from those available in the standard mathematical subroutine 
libraries. The programs in these packages will be called general purpose. 

The programs in general-purpose software packages differ in their intent from the algo- 
rithms and programs provided with this book. General-purpose software packages consider 
ways to reduce errors due to machine rounding, underflow, and overflow. They also de- 
scribe the range of input that will lead to results of a certain specified accuracy. These are 
machine-dependent characteristics, so general-purpose software packages use parameters 
that describe the floating-point characteristics of the machine being used for computations. 


Illustration To illustrate some differences between programs included in a general-purpose package 
and a program that we would provide for use in this book, let us consider an algorithm that 
computes the Euclidean norm of an n-dimensional vector x = (x), .%2,...,X,)'. This norm 
is often required within larger programs and is defined by 


n 

2 

lIxllo = | ox 
i=1 


The norm gives a measure for the distance from the vector x to the vector 0. For example, 
the vector x = (2, 1,3, —2, —1)‘ has 


[Ixll2 = (27 +P? +3? + (2)? + C1]? = V19, 


so its distance from 0 = (0,0,0,0,0)! is /19 © 4.36. 

An algorithm of the type we would present for this problem is given here. It includes 
no machine-dependent parameters and provides no accuracy assurances, but it will give 
accurate results “most of the time.” 


1/2 


INPUT 7, x1,%2,...,Xp. 

OUTPUT NORM. 

Step 7 Set SUM =0. 

Step 2 For i = 1,2,...,n set SUM = SUM + x?. 
Step 3 Set NORM = SUM'”. 


Step 4 OUTPUT (NORM); 
STOP. 


A program based on our algorithm is easy to write and understand. However, the pro- 
gram could fail to give sufficient accuracy for a number of reasons. For example, the magni- 
tude of some of the numbers might be too large or too small to be accurately represented in 
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the floating-point system of the computer. Also, this order for performing the calculations 
might not produce the most accurate results, or the standard software square-root routine 
might not be the best available for the problem. Matters of this type are considered by algo- 
rithm designers when writing programs for general-purpose software. These programs are 
often used as subprograms for solving larger problems, so they must incorporate controls 
that we will not need. 


General Purpose Algorithms 


Let us now consider an algorithm for a general-purpose software program for computing 
the Euclidean norm. First, it is possible that although a component x; of the vector is within 
the range of the machine, the square of the component is not. This can occur when some |x;| 
is so small that x7 causes underflow or when some |x;| is so large that x7 causes overflow. 
It is also possible for all these terms to be within the range of the machine, but overflow 
occurs from the addition of a square of one of the terms to the previously computed sum. 

Accuracy criteria depend on the machine on which the calculations are being performed, 
so machine-dependent parameters are incorporated into the algorithm. Suppose we are 
working on a hypothetical computer with base 10, having t > 4 digits of precision, a 
minimum exponent emin, and a maximum exponent emax. Then the set of floating-point 
numbers in this machine consists of 0 and the numbers of the form 


x=f-10°, where f=+(f)l107'+ f102+---+ £1079, 


where | < ff; < 9and0 < f; < 9, for each i = 2,...,t, and where emin < e < emax. 
These constraints imply that the smallest positive number represented in the machine is 
o = 10%""—!, so any computed number x with |x| < o causes underflow and results in 
x being set to 0. The largest positive number is A = (1 — 10~‘)10°"*, and any computed 
number x with |x| > A causes overflow. When underflow occurs, the program will continue, 
often without a significant loss of accuracy. If overflow occurs, the program will fail. 

The algorithm assumes that the floating-point characteristics of the machine are de- 
scribed using parameters N, s, S, y, and Y. The maximum number of entries that can be 
summed with at least t/2 digits of accuracy is given by N. This implies the algorithm will 
proceed to find the norm of a vector x = (x1,X2,...,Xn)' only if n < N. To resolve the 
underflow-overflow problem, the nonzero floating-point numbers are partitioned into three 
groups: 


© small-magnitude numbers x, those satisfying 0 < |x| < y; 
© medium-magnitude numbers x, where y < |x| < Y; 


e large-magnitude numbers x, where Y < |x|. 


The parameters y and Y are chosen so that there will be no underflow-overflow prob- 
lem in squaring and summing the medium-magnitude numbers. Squaring small-magnitude 
numbers can cause underflow, so a scale factor S much greater than 1 is used with the result 
that (Sx)? avoids the underflow even when x? does not. Summing and squaring numbers 
having a large magnitude can cause overflow. So in this case, a positive scale factor s much 
smaller than 1 is used to ensure that (sx)? does not cause overflow when calculated or 
incorporated into a sum, even though x? would. 

To avoid unnecessary scaling, y and Y are chosen so that the range of medium- 
magnitude numbers is as large as possible. The algorithm that follows is a modification 
of one described in [Brow, W], p. 471. It incorporates a procedure for adding scaled compo- 
nents of the vector that are small in magnitude until a component with medium magnitude 
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is encountered. It then unscales the previous sum and continues by squaring and summing 
small and medium numbers until a component with a large magnitude is encountered. Once 
a component with large magnitude appears, the algorithm scales the previous sum and 
proceeds to scale, square, and sum the remaining numbers. 

The algorithm assumes that, in transition from small to medium numbers, unscaled 
small numbers are negligible when compared to medium numbers. Similarly, in transition 
from medium to large numbers, unscaled medium numbers are negligible when compared to 
large numbers. Thus, the choices of the scaling parameters must be made so that numbers 
are equated to 0 only when they are truly negligible. Typical relationships between the 
machine characteristics as described by ft, 0, 4, emin, emax, and the algorithm parameters 
N, s, 8S, y, and Y are given after the algorithm. 

The algorithm uses three flags to indicate the various stages in the summation process. 
These flags are given initial values in Step 3 of the algorithm. FLAG | is 1 until a medium or 
large component is encountered; then it is changed to 0. FLAG 2 is 0 while small numbers 
are being summed, changes to 1 when a medium number is first encountered, and changes 
back to 0 when a large number is found. FLAG 3 is initially 0 and changes to 1 when a 
large number is first encountered. Step 3 also introduces the flag DONE, which is 0 until 
the calculations are complete, and then changes to 1. 


INPUT JN,s,S,y, ¥,A,n,x1,%2,...5Xp- 
OUTPUT NORM or an appropriate error message. 


Step 1 Ifn < 0 then OUTPUT (‘The integer n must be positive.’ ); 
STOP. 


Step 2 Ifn > N then OUTPUT (‘The integer n is too large.’); 
STOP. 


Step 3 Set SUM = 0; 
FLAG\| = 1; (The small numbers are being summed.) 


FLAG2 = 0; 
FLAG3 = 0; 
DONE = 0; 
i=l. 


Step 4 While (i < n and FLAG1 = 1) do Step 5. 


Step 5 If |x;| < y then set SUM = SUM + (Sx;)*; 
i=it+l 
else set FLAG] =0. (A non-small number encountered.) 


Step 6 Ifi>nthen set NORM = (SUM)!/"/S; 


DONE = 1 
else set SUM = (SUM/S)/S; (Scale for larger numbers.) 
FLAG2 = 1. 


Step 7 While (i < n and FLAG2 = 1) do Step 8. (Sum the medium-sized numbers.) 
Step 8 If |x;| < Y then set SUM = SUM + x?; 
i=i+l 
else set FLAG2 = 0. (A large number has been encountered.) 
Step 9 If DONE = 0 then 
if i > n then set NORM = (SUM)!"’?; 


DONE = 1 
else set SUM = ((SUM)s)s; (Scale the large numbers.) 
FLAG3 = 1. 
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The first portable computer was 
the Osborne I, produced in 1981, 
although it was much larger and 
heaver than we would currently 
think of as portable. 


The system FORTRAN 
(FORmula TRANSslator) was the 
original general-purpose 


scientific programming language. 


It is still in wide use in situations 
that require intensive scientific 
computations. 


The EISPACK project was the 
first large-scale numerical 
software package to be made 
available in the public domain 
and led the way for many 
packages to follow. 


Mathematical Preliminaries and Error Analysis 


Step 10 While (i < n and FLAG3 = 1) do Step 11. 


Step 11. Set SUM = SUM +(sx;)*; 
i=it+l. 
Step 12 If DONE = 0 then 
if SUM'/? < is then set NORM = (SUM)!/?/s; 
DONE = 1 
else set SUM =i. 


Step 13 If DONE = 1 then OUTPUT (‘Norm is’, NORM) 
else OUTPUT (‘Norm >’, NORM, ‘overflow occurred’). 


(Sum the large numbers.) 


(The norm is too large.) 


Step 14 STOP. 


The relationships between the machine characteristics ft, 0, 4, emin, emax, and the 
algorithm parameters N, s, S, y, and Y were chosen in [Brow, W], p. 471, as: 


N=10%, where ey = [(t—2)/2], the greatest integer less than or equal to 
(t — 2)/2; 
s=10%, where e, = |—(emax + en)/2]; 
S= 10%, where es = [(1—emin)/2], the smallest integer greater than or equal 
to (1 —emin)/2; 
y=10%, where e, = [(emin+t—2)/2]; 
Y = 10°", where ey = [(emax — eny)/2]. 


The reliability built into this algorithm has greatly increased the complexity compared to 
the algorithm given earlier in the section. In the majority of cases the special-purpose and 
general-purpose algorithms give identical results. The advantage of the general-purpose 
algorithm is that it provides security for its results. 

Many forms of general-purpose numerical software are available commercially and in 
the public domain. Most of the early software was written for mainframe computers, and 
a good reference for this is Sources and Development of Mathematical Software, edited by 
Wayne Cowell [Co]. 

Now that personal computers are sufficiently powerful, standard numerical software 
is available for them. Most of this numerical software is written in FORTRAN, although 
some packages are written in C, C++, and FORTRAN9O. 

ALGOL procedures were presented for matrix computations in 1971 in [WR]. A pack- 
age of FORTRAN subroutines based mainly on the ALGOL procedures was then developed 
into the EISPACK routines. These routines are documented in the manuals published by 
Springer-Verlag as part of their Lecture Notes in Computer Science series [Sm,B] and [Gar]. 
The FORTRAN subroutines are used to compute eigenvalues and eigenvectors for a variety 
of different types of matrices. 

LINPACK is a package of FORTRAN subroutines for analyzing and solving systems 
of linear equations and solving linear least squares problems. The documentation for this 
package is contained in [DBMS]. A step-by-step introduction to LINPACK, EISPACK, and 
BLAS (Basic Linear Algebra Subprograms) is given in [CV]. 

The LAPACK package, first available in 1992, is a library of FORTRAN subroutines 
that supercedes LINPACK and EISPACK by integrating these two sets of algorithms into 
a unified and updated package. The software has been restructured to achieve greater effi- 
ciency on vector processors and other high-performance or shared-memory multiprocessors. 
LAPACK is expanded in depth and breadth in version 3.0, which is available in FORTRAN, 
FORTRANOO, C, C++, and JAVA. C, and JAVA are only available as language interfaces 
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Software engineering was 
established as a laboratory 
discipline during the 1970s and 
1980s. EISPACK was developed 
at Argonne Labs and LINPACK 
there shortly thereafter. By the 
early 1980s, Argonne was 
internationally recognized as a 
world leader in symbolic and 
numerical computation. 


In 1970 IMSL became the first 
large-scale scientific library for 
mainframes. Since that time, the 
libraries have been made 
available for computer systems 
ranging from supercomputers to 
personal computers. 


The Numerical Algorithms 
Group (NAG) was instituted in 
the UK in 1971 and developed 
the first mathematical software 
library. It now has over 10,000 
users world-wide and contains 
over 1000 mathematical and 
statistical functions ranging 
from statistical, symbolic, 
visualisation, and numerical 
simulation software, to compilers 
and application development 
tools. 


MATLAB was originally written 
to provide easy access to matrix 
software developed in the 
LINPACK and EISPACK 
projects. The first version was 
written in the late 1970s for use 
in courses in matrix theory, linear 
algebra, and numerical analysis. 
There are currently more than 
500,000 users of MATLAB in 
more than 100 countries. 
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or translations of the FORTRAN libraries of LAPACK. The package BLAS is not a part of 
LAPACK, but the code for BLAS is distributed with LAPACK. 

Other packages for solving specific types of problems are available in the public domain. 
As an alternative to netlib, you can use Xnetlib to search the database and retrieve software. 
More information can be found in the article Software Distribution using Netlib by Dongarra, 
Roman, and Wade [DRW]. 

These software packages are highly efficient, accurate, and reliable. They are thor- 
oughly tested, and documentation is readily available. Although the packages are portable, 
it is a good idea to investigate the machine dependence and read the documentation thor- 
oughly. The programs test for almost all special contingencies that might result in error and 
failures. At the end of each chapter we will discuss some of the appropriate general-purpose 
packages. 

Commercially available packages also represent the state of the art in numerical meth- 
ods. Their contents are often based on the public-domain packages but include methods in 
libraries for almost every type of problem. 

IMSL (International Mathematical and Statistical Libraries) consists of the libraries 
MATH, STAT, and SFUN for numerical mathematics, statistics, and special functions, re- 
spectively. These libraries contain more than 900 subroutines originally available in FOR- 
TRAN 77 and now available in C, FORTRAN90, and JAVA. These subroutines solve the 
most common numerical analysis problems. The libraries are available commercially from 
Visual Numerics. 

The packages are delivered in compiled form with extensive documentation. There is an 
example program for each routine as well as background reference information. IMSL con- 
tains methods for linear systems, eigensystem analysis, interpolation and approximation, 
integration and differentiation, differential equations, transforms, nonlinear equations, opti- 
mization, and basic matrix/vector operations. The library also contains extensive statistical 
routines. 

The Numerical Algorithms Group (NAG) has been in existence in the United Kingdom 
since 1970. NAG offers more than 1000 subroutines in a FORTRAN 77 library, about 400 
subroutines in a C library, more than 200 subroutines in a FORTRAN 90 library, and an 
MPI FORTRAN numerical library for parallel machines and clusters of workstations or 
personal computers. A useful introduction to the NAG routines is [Ph]. The NAG library 
contains routines to perform most standard numerical analysis tasks in a manner similar to 
those in the IMSL. It also includes some statistical routines and a set of graphic routines. 

The IMSL and NAG packages are designed for the mathematician, scientist, or engineer 
who wishes to call high-quality C, Java, or FORTRAN subroutines from within a program. 
The documentation available with the commercial packages illustrates the typical driver 
program required to use the library routines. The next three software packages are stand- 
alone environments. When activated, the user enters commands to cause the package to solve 
a problem. However, each package allows programming within the command language. 

MATLAB is a matrix laboratory that was originally a Fortran program published by 
Cleve Moler [Mo] in the 1980s. The laboratory is based mainly on the EISPACK and 
LINPACK subroutines, although functions such as nonlinear systems, numerical integration, 
cubic splines, curve fitting, optimization, ordinary differential equations, and graphical tools 
have been incorporated. MATLAB is currently written in C and assembler, and the PC 
version of this package requires a numeric coprocessor. The basic structure is to perform 
matrix operations, such as finding the eigenvalues of a matrix entered from the command 
line or from an external file via function calls. This is a powerful self-contained system that 
is especially useful for instruction in an applied linear algebra course. 

The second package is GAUSS, a mathematical and statistical system produced by Lee 
E. Ediefson and Samuel D. Jones in 1985. Itis coded mainly in assembler and based primarily 
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on EISPACK and LINPACK. As in the case of MATLAB, integration/differentiation, non- 
linear systems, fast Fourier transforms, and graphics are available. GAUSS is oriented less 
toward instruction in linear algebra and more toward statistical analysis of data. This package 
also uses a numeric coprocessor if one is available. 

The third package is Maple, a computer algebra system developed in 1980 by the 
Symbolic Computational Group at the University of Waterloo. The design for the original 
Maple system is presented in the paper by B.W. Char, K.O. Geddes, W.M. Gentlemen, and 
G.H. Gonnet [CGGG]. 


The NAG routines are compatible Maple, which is written in C, has the ability to manipulate information in a symbolic 
with Maple beginning with manner. This symbolic manipulation allows the user to obtain exact answers instead of 
version 9.0. numerical values. Maple can give exact answers to mathematical problems such as integrals, 


differential equations, and linear systems. It contains a programming structure and permits 
text, as well as commands, to be saved in its worksheet files. These worksheets can then 
be loaded into Maple and the commands executed. Because of the properties of symbolic 
computation, numerical computation, and worksheets, Maple is the language of choice for 
this text. Throughout the book Maple commands, particularly from the NumericalAnalysis 
package, will be included in the text. 


Although we have chosen Maple Numerous packages are available that can be classified as supercalculator packages for 
as our standard computer algebra the PC. These should not be confused, however, with the general-purpose software listed 
system, the equally popular here. If you have an interest in one of these packages, you should read Supercalculators on 


Mathematica, released in 1988, the PC by B. Simon and R. M. Wilson [SW]. 

Caivalse be-used ink ths purpose: Additional information about software and software libraries can be found in the books 
by Cody and Waite [CW] and by Kockler [Ko], and in the 1995 article by Dongarra and 
Walker [DW]. More information about floating-point computation can be found in the book 
by Chaitini-Chatelin and Frayse [CF] and the article by Goldberg [Go]. 

Books that address the application of numerical techniques on parallel computers in- 

clude those by Schendell [Sche], Phillips and Freeman [PF], Ortega [Or1], and Golub and 
Ortega [GO]. 
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Solutions of Equations in One Variable 


Introduction 


The growth of a population can often be modeled over short periods of time by assuming that 
the population grows continuously with time at a rate proportional to the number present at 
that time. Suppose that V(t) denotes the number in the population at time ft and A denotes the 
constant birth rate of the population. Then the population satisfies the differential equation 


dN(t) _ 
“ae AN (t), 


whose solution is N(t) = Noe’, where No denotes the initial population. 
NA) A 


3000 + 


4 
NA) = 1000e* + #3 A — 1) 


2000 + A 


1564 
1435 


Population (thousands) 


1000 -- 


Birth rate 


This exponential model is valid only when the population is isolated, with no im- 
migration. If immigration is permitted at a constant rate v, then the differential equation 
becomes 


dN(t) _ 
a AN(t) + v, 


whose solution is 
v 
N(t) = Noe + 5 =o: 
47 
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Suppose a certain population contains N(0) = 1,000,000 individuals initially, that 
435,000 individuals immigrate into the community in the first year, and that N(1) = 
1,564,000 individuals are present at the end of one year. To determine the birth rate of 
this population, we need to find 4 in the equation 


435,000 
1,564,000 = 1,000,000e* + —~— 


(= 1), 

It is not possible to solve explicitly for A in this equation, but numerical methods discussed in 
this chapter can be used to approximate solutions of equations of this type to an arbitrarily 
high accuracy. The solution to this particular problem is considered in Exercise 24 of 
Section 2.3. 


| 2.1 The Bisection Method 


In this chapter we consider one of the most basic problems of numerical approximation, 
the root-finding problem. This process involves finding a root, or solution, of an equation 
of the form f(x) = 0, for a given function f. A root of this equation is also called a zero 
of the function /. 

The problem of finding an approximation to the root of an equation can be traced back 
at least to 1700 B.c.£. A cuneiform table in the Yale Babylonian Collection dating from that 
period gives a sexigesimal (base-60) number equivalent to 1.414222 as an approximation to 
/2, a result that is accurate to within 10-5. This approximation can be found by applying 
a technique described in Exercise 19 of Section 2.2. 


Bisection Technique 


In computer science, the process The first technique, based on the Intermediate Value Theorem, is called the Bisection, or 
of dividing a set continually in Binary-search, method. 

half to search for the solution to a Suppose f is a continuous function defined on the interval [a, b], with f(a) and f(b) 
problem, as the bisection method 9 F gpposite sign. The Intermediate Value Theorem implies that a number p exists in (a, b) 
does, is known as a binary search with f(p) = 0. Although the procedure will work when there is more than one root in the 
interval (a, b), we assume for simplicity that the root in this interval is unique. The method 
calls for a repeated halving (or bisecting) of subintervals of [a, b] and, at each step, locating 
the half containing p. 

To begin, set aj = a and b,; = b, and let p; be the midpoint of [a, b]; that is, 


procedure. 


bj-a ath 
ee 


PH 


e If f(p1) = 0, then p = pi, and we are done. 


e If f(p1) £0, then f(p1) has the same sign as either f(a,) or f(b1). 


e If f(pi) and f(a) have the same sign, p € (pi, 1). Set ag = p; and bp = by. 
e If f(pi) and f(a) have opposite signs, p € (a1, p1). Set a2 = a; and bp = py. 


Then reapply the process to the interval [a,b]. This produces the method described in 
Algorithm 2.1. (See Figure 2.1.) 
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Figure 2.1 


Bisection 


To find a solution to f(x) = 0 given the continuous function f on the interval [a, b], where 
f(a) and f(b) have opposite signs: 


INPUT endpoints a, b; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 


Step 7 Seti=1; 
FA= f(a). 


Step 2. While i < No do Steps 3-6. 


Step 3 Setp=a+(b—a)/2; (Compute p;.) 
FP = f(p). 
Step 4 If FP =0Oor (b—a)/2 < TOL then 
OUTPUT (p);_ (Procedure completed successfully.) 
STOP. 


Step 5 Seti=it+l. 


Step 6 If FA-FP>Othenseta=p; (Compute a;,b;.) 
FA = FP 
else setb =p. (FA is unchanged.) 


Step 7 OUTPUT (‘Method failed after No iterations, No =’, No); 
(The procedure was unsuccessful.) 
STOP. | 


Other stopping procedures can be applied in Step 4 of Algorithm 2.1 or in any of 
the iterative techniques in this chapter. For example, we can select a tolerance « > O and 
generate p),...,py until one of the following conditions is met: 
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CHAPTER 2 «= 


Example 1 


Solutions of Equations in One Variable 


| Pw — pn-il < &, (2.1) 
TEN PN ae. ge Got (2.2) 
| Pn 
|f(pw)| < €. (2.3) 


Unfortunately, difficulties can arise using any of these stopping criteria. For example, 
there are sequences { p,}"°, with the property that the differences p, — Py—1 converge to 
zero while the sequence itself diverges. (See Exercise 17.) It is also possible for f(p,) to 
be close to zero while p, differs significantly from p. (See Exercise 16.) Without additional 
knowledge about f or p, Inequality (2.2) is the best stopping criterion to apply because it 
comes closest to testing relative error. 

When using a computer to generate approximations, it is good practice to set an upper 
bound on the number of iterations. This eliminates the possibility of entering an infinite 
loop, a situation that can arise when the sequence diverges (and also when the program is 
incorrectly coded). This was done in Step 2 of Algorithm 2.1 where the bound No was set 
and the procedure terminated if i > No. 

Note that to start the Bisection Algorithm, an interval [a, b] must be found with f(a) - 
f(b) < 0. At each step the length of the interval known to contain a zero of f is reduced 
by a factor of 2; hence it is advantageous to choose the initial interval [a,b] as small as 
possible. For example, if f(x) = 2x? — x7 + x — 1, we have both 


Ff(—4)-f@) <0 and f(0)- FQ) <9, 


so the Bisection Algorithm could be used on [—4,4] or on [0, 1]. Starting the Bisection 
Algorithm on [0, 1] instead of [—4, 4] will reduce by 3 the number of iterations required to 
achieve a specified accuracy. 

The following example illustrates the Bisection Algorithm. The iteration in this example 
is terminated when a bound for the relative error is less than 0.0001. This is ensured by 
having 


Ip Pal 2 10-4. 
min{|an|, |Dn|} 


Show that f(x) = x? + 4x* — 10 = Ohas a root in [1, 2], and use the Bisection method to 
determine an approximation to the root that is accurate to at least within 1074. 


Solution Because f(1) = —Sand f(2) = 14 the Intermediate Value Theorem 1.11 ensures 
that this continuous function has a root in [1, 2]. 

For the first iteration of the Bisection method we use the fact that at the midpoint of 
[1,2] we have f(1.5) = 2.375 > 0. This indicates that we should select the interval [1, 1.5] 
for our second iteration. Then we find that f(1.25) = —1.796875 so our new interval 
becomes [1.25, 1.5], whose midpoint is 1.375. Continuing in this manner gives the values 
in Table 2.1. After 13 iterations, p)3 = 1.365112305 approximates the root p with an error 


|p — pi3| < |bi4 — ay4| = |1.365234375 — 1.365112305| = 0.000122070. 
Since |aj4| < |p|, we have 


Ip—pi3al_ _ |bi4a — Ayal 
| Pl |ay4| 


< 9.0 x 107, 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Table 2.1 


Theorem 2.1 


2.1. The Bisection Method 51 


an Dn Pn f (Pn) 
1 1.0 2.0 1.5 2.375 
2 1.0 1.5 1.25 —1.79687 
3 1.25 1.5 1.375 0.16211 
4 1.25 1.375 1.3125 —0.84839 
5 1.3125 1.375 1.34375 —0.35098 
6 1.34375 1.375 1.359375 —0.09641 
7 1.359375 1.375 1.3671875 0.03236 
8 1.359375 1.3671875 1.36328125 —0.03215 
9 1.36328125 1.3671875 1.365234375 0.000072 
10 1.36328125 1.365234375 1.364257813 —0.01605 
11 1.364257813 1.365234375 1.364746094 —0.00799 
12 1.364746094 1.365234375 1.364990235 —0.00396 
13 1.364990235 1.365234375 1.365112305 —0.00194 


so the approximation is correct to at least within 10~*. The correct value of p to nine decimal 
places is p = 1.365230013. Note that po is closer to p than is the final approximation pj3. 
You might suspect this is true because | f(po)| < |f(13)|, but we cannot be sure of this 
unless the true answer is known. a 


The Bisection method, though conceptually clear, has significant drawbacks. It is rel- 
atively slow to converge (that is, NV may become quite large before | p — py| is sufficiently 
small), and a good intermediate approximation might be inadvertently discarded. However, 
the method has the important property that it always converges to a solution, and for that 
reason it is often used as a starter for the more efficient methods we will see later in this 
chapter. 


Suppose that f € C[a,b] and f(a)- f(b) < 0. The Bisection method generates a sequence 

{ Pa}p2, approximating a zero p of f with 

b— 
Qn 


|Pn — pl < “> when n>. a 


Proof For eachn > 1, we have 
1 
by — An = relied —a) and peé (a,b). 
Since py, = +(n + b,) for all n > 1, it follows that 
b-a 


1 
n < bn n) = : = 8 &@ 
IPn»— PIS 56 An) ai 


Because 
1 


converges to p with rate of convergence O (sr); that is, 


gn 
=p+O : 
Pn =P an) 


It is important to realize that Theorem 2.1 gives only a bound for approximation error 
and that this bound might be quite conservative. For example, this bound applied to the 
problem in Example | ensures only that 


| Pn — Pl = (b—a) 


[oe] 


the sequence { pn}, 
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Example 2 


Solutions of Equations in One Variable 


2-1 4 

Ip — po| < a ee 10°, 

but the actual error is much smaller: 
| p — po| = |1.365230013 — 1.365234375| ~ 4.4 x 10°°. 
Determine the number of iterations necessary to solve f(x) = x° + 4x? — 10 = 0 with 
accuracy 10-3 using a; = 1 and b; = 2. 
Solution We we will use logarithms to find an integer N that satisfies 
lpy —p| <2-“(b-a)=2% < 10°. 


Logarithms to any base would suffice, but we will use base-10 logarithms because the toler- 
ance is given as a power of 10. Since2~" < 10-7 implies thatlog,,2~" < log,,) 10-3 = —3, 
we have 


3 
—N log;y2 < —3 and N> = 9.96. 
logig 2 


Hence, ten iterations will ensure an approximation accurate to within 107°. 

Table 2.1 shows that the value of po = 1.365234375 is accurate to within 10~*. Again, 
it is important to keep in mind that the error analysis gives only a bound for the number of 
iterations. In many cases this bound is much larger than the actual number required. a 


Maple has a NumericalAnalysis package that implements many of the techniques we 
will discuss, and the presentation and examples in the package are closely aligned with this 
text. The Bisection method in this package has a number of options, some of which we will 
now consider. In what follows, Maple code is given in black italic type and Maple response 
in cyan. 

Load the NumericalAnalysis package with the command 


with(Student[NumericalAnalysis]) 

which gives access to the procedures in the package. Define the function with 
f :=2° + 4x? — 10 

and use 

Bisection (f ,x = [1,2], tolerance = 0.005) 

Maple returns 


1.363281250 


Note that the value that is output is the same as pg in Table 2.1. 
The sequence of bisection intervals can be output with the command 


Bisection (f ,x = [1,2], tolerance = 0.005, output = sequence) 

and Maple returns the intervals containing the solution together with the solution 

[1.,2.], [1., 1.500000000], [1.250000000, 1.500000000], [1.250000000, 1.375000000], 
[1.312500000, 1.375000000], [1.343750000, 1.375000000], [1.359375000, 1.375000000], 
[1.359375000, 1.367187500], 1.363281250 

The stopping criterion can also be based on relative error by choosing the option 


Bisection (f ,x = [1,2], tolerance = 0.005, stoppingcriterion = relative) 
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Now Maple returns 
1.363281250 
The option output = plot given in 
Bisection (f ,x = [1.25, 1.5], output = plot, tolerance = 0.02) 


produces the plot shown in Figure 2.2. 


Figure 2.2 


4 iteration(s) of the bisection method applied to 
fy) =x + 4x? -10 
with initial points a = 1.25 and b= 1.5 


We can also set the maximum number of iterations with the option maxiterations = . 
An error message will be displayed if the stated tolerance is not met within the specified 
number of iterations. 

The results from Bisection method can also be obtained using the command Roots. For 
example, 


1 
Roots ( f,x = [1.0, 2.0], method = bisection, tolerance = 100’ output = information 


uses the Bisection method to produce the information 


n An by Pr Ft @n) relative error 

1 1.0 2.0 1.500000000 =. 2.37500000 = 03333333333 

a 1.0 1.500000000 = 1.250000000 —1.796875000 0.2000000000 

3 1.250000000 1.500000000 1.375000000 0.16210938 0.09090909091 
4 1.250000000 1.375000000 = 1.312500000 —0.848388672 0.04761904762 
5 1.312500000  1.375000000 = 1.343750000 —0.350982668 0.02325581395 
6 1.343750000 1.375000000 1.359375000 —0.096408842 0.01149425287 
7 1.359375000 1.375000000 1.367187500  =—0.03235578 ~—-0.005714285714 
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The bound for the number of iterations for the Bisection method assumes that the cal- 
culations are performed using infinite-digit arithmetic. When implementing the method on 
acomputer, we need to consider the effects of round-off error. For example, the computation 
of the midpoint of the interval [a,, b,] should be found from the equation 


Dy — an 


n by 
Pn = 4,+ oe instead of p, = fore 


2 


The first equation adds a small correction, (b, — a,)/2, to the known value a,. When b, — ay, 
is near the maximum precision of the machine, this correction might be in error, but the 
error would not significantly affect the computed value of p,,. However, when b,, — a, is near 
the maximum precision of the machine, it is possible for (a, + b,)/2 to return a midpoint 
that is not even in the interval [a,, b,]. 

The Latin word signum means As a final remark, to determine which subinterval of [a,, b,] contains a root of f, it is 

“token” or “sign”. So the signum _ better to make use of the signum function, which is defined as 

function quite naturally returns 


the sign of a number (unless the —1 ; ifx < 0, 
number is 0). sgn(x) — 0, ifx = 0, 
1, ifx > 0. 


The test 


sgn (f (an)) sgn (f (pn)) < 0 instead of FS (an) f (Pn) < 0 


gives the same result but avoids the possibility of overflow or underflow in the multiplication 
of f(a,) and f (pn). 


EXERCISE SET 2.1 


1. Use the Bisection method to find p3 for f(x) = ./x — cosx on [0, 1]. 
2. Let f@) =3~4+)Da- S)(x — 1). Use the Bisection method on the following intervals to find p3. 
[—2, 1.5] b.  [—1.25, 2.5] 


Jse the Bisection method to find solutions accurate to within 10~? for x7 — 7x* + 14x —6 = 0on 
ach interval. 


[0, 1] b. [1,3.2] ec. [3.2,4] 


Jse the Bisection method to find solutions accurate to within 10~? for x* — 2x? — 4x7 + 4x +4=0 
n each interval. 


{[—2,—-1] b. [0,2] ce [2,3] d. [—1,0] 
se the Bisection method to find solutions accurate to within 10> for the following problems. 
x-2*=0 for0<x<1 
e&—x74+3x-2=0 for0<x<1 
2x cos(2x) —(«x +1)? =0 for—3<x<-2 and -1<x<0 
xcosx —2x7+4+3x-1=0 for0.2<x<03 and 12<x<13 


6. Use the Bisection method to find solutions, accurate to within 10~> for the following problems. 


we 
ec ps 


Py 
oc, 


Ber P ace 


a. 3x—e°=Oforl<x<2 

b. 2x+3cosx—e*=0 for0<x<1 

ec x«—4x+4-Inx=0 forl<x<2 and 2<x<4 
d 


x+1-—2sinnx=0 forO0<x<05 and O05<x<1 
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10. 


11. 


12. 


13. 
14. 


15. 


16. 


17. 


18. 


19. 


20. 
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a. Sketch the graphs of y = x and y = 2sinx. 

b. Use the Bisection method to find an approximation to within 10~> to the first positive value of 
x with x = 2sinx. 

a. Sketch the graphs of y = x and y = tanx. 

b. Use the Bisection method to find an approximation to within 10~> to the first positive value of 
xX with x = tanx. 

a. Sketch the graphs of y = e* — 2 and y = cos(e* — 2). 

b. Use the Bisection method to find an approximation to within 10~> to a value in [0.5, 1.5] with 
e* —2 =cos(e* — 2). 

Let fx) = @+2)@4+ 1)?x(« — 1)3(x — 2). To which zero of f does the Bisection method converge 

when applied on the following intervals? 

a. [—1.5,2.5] b. = [—0.5, 2.4] ec [—0.5,3] d.  [—3,—0.5] 

Let f(x) = («+ 2)(x + I)x(x — 1)3(x — 2). To which zero of f does the Bisection method converge 

when applied on the following intervals? 

a. [—3,2.5] b.  [—2.5,3] ec.  [—1.75,1.5] d. [—1.5, 1.75] 

Find an approximation to 3 correct to within 107+ using the Bisection Algorithm. [Hint: Consider 

f(x) =x —3.] 

Find an approximation to «/25 correct to within 10~* using the Bisection Algorithm. 

Use Theorem 2.1 to find a bound for the number of iterations needed to achieve an approximation 


with accuracy 10~? to the solution of x* +x —4 = 0 lying in the interval [1, 4]. Find an approximation 
to the root with this degree of accuracy. 

Use Theorem 2.1 to find a bound for the number of iterations needed to achieve an approximation 
with accuracy 10~* to the solution of x? — x — 1 = O lying in the interval [1, 2]. Find an approximation 
to the root with this degree of accuracy. 

Let f(x) = (x — 1)!°, p = 1, and p, = 1+ 1/n. Show that | f(p,)| < 10-3 whenever n > 1 but that 
|p — Pal < 1073 requires that n > 1000. 

Let { pn} be the sequence defined by pn = Yr, :. Show that { p,,} diverges even though limy-,0( Pn — 
Pn-1) = 0. 

The function defined by f(x) = sin zx has zeros at every integer. Show that when —1 < a < 0 and 
2 < b < 3, the Bisection method converges to 

a 0, if a+b<2 b 2, if a+b>2 e 1, if a+b=2 

A trough of length L has a cross section in the shape of a semicircle with radius r. (See the accom- 
panying figure.) When filled with water to within a distance h of the top, the volume V of water is 


V =L[0.5ar? — r’ arcsin(h/r) — h(r? — h°)'/}. 


Suppose L = 10 ft, r= 1 ft, and V = 12.4 ft*. Find the depth of water in the trough to within 0.01 ft. 
A particle starts at rest on a smooth inclined plane whose angle 6 is changing at a constant rate 
do 


—=o <0. 


dt 
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At the end of t seconds, the position of the object is given by 


‘s g out — put aa 
x(t) = sin ot ). 
202 2 


Suppose the particle has moved 1.7 ft in 1 s. Find, to within 10~>, the rate w at which 6 changes. 


Assume that g = 32.17 ft/s’. 
a 


>... 


2.2 Fixed-Point Iteration 


Definition 2.2 


Fixed-point results occur in many 
areas of mathematics, and are a 
major tool of economists for 
proving results concerning 
equilibria. Although the idea 
behind the technique is old, the 
terminology was first used by the 
Dutch mathematician 

L. E. J. Brouwer (1882-1962) in 
the early 1900s. 


Example 1 


A fixed point for a function is a number at which the value of the function does not change 
when the function is applied. 


The number p is a fixed point for a given function g if g(p) = p. a 


In this section we consider the problem of finding solutions to fixed-point problems 
and the connection between the fixed-point problems and the root-finding problems we 
wish to solve. Root-finding problems and fixed-point problems are equivalent classes in the 
following sense: 


© Given a root-finding problem f(p) = 0, we can define functions g with a fixed point at 
p ina number of ways, for example, as 


gx) =x—f@) oras g(x) =x4+3fQ@). 
© Conversely, if the function g has a fixed point at p, then the function defined by 


f(x) =x — g(x) 


has a zero at p. 


Although the problems we wish to solve are in the root-finding form, the fixed-point 
form is easier to analyze, and certain fixed-point choices lead to very powerful root-finding 
techniques. 

We first need to become comfortable with this new type of problem, and to decide 
when a function has a fixed point and how the fixed points can be approximated to within 
a specified accuracy. 


Determine any fixed points of the function g(x) = x” — 2. 
Solution A fixed point p for g has the property that 
p=s8(p) =p’ —2  whichimplies that 0 =p’ —p—2=(p+1)(p—2). 


A fixed point for g occurs precisely when the graph of y = g(x) intersects the graph of 
y = x, so g has two fixed points, one at p = —1 and the other at p = 2. These are shown in 
Figure 2.3. a 
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Figure 2.3 


The following theorem gives sufficient conditions for the existence and uniqueness of 
a fixed point. 


Theorem 2.3 (i) If g € C[a,b] and g(x) € [a,b] for all x € [a,b], then g has at least one fixed 
point in [a, b]. 


(ii) If, in addition, g’ (x) exists on (a, b) and a positive constant k < 1 exists with 
le’(x)| <k, forall x € (a,b), 


then there is exactly one fixed point in [a, b]. (See Figure 2.4.) | 


Figure 2.4 


Proof 


Gi) If g(a) = aor g(b) = b, then g has a fixed point at an endpoint. If not, then 
g(a) > aand g(b) < b. The function h(x) = g(x) —x is continuous on [a, b], with 


h(a) = g(a)—a>0O and h(b)=g(b)—-b <0. 
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The Intermediate Value Theorem implies that there exists p € (a,b) for which 
h(p) = 0. This number p is a fixed point for g because 


0=h(p)=8(p)—p_ implies that g(p) =p. 


(ii) Suppose, in addition, that |g’(x)| < k < 1 and that p and q are both fixed points 
in [a, b]. If p # q, then the Mean Value Theorem implies that a number & exists 
between p and q, and hence in [a, b], with 


gp) = ae afte). 


Thus 


Ilp— gl =|s(p) — (Ml =le'@llp—a@l <klp—al <l|p-al. 


which is a contradiction. This contradiction must come from the only supposition, 
p # q. Hence, p = q and the fixed point in [a, b] is unique. oo 8 


Show that g(x) = (x? — 1) /3 has a unique fixed point on the interval [—1, 1]. 


Solution The maximum and minimum values of g(x) for x in [—1, 1] must occur either 
when x is an endpoint of the interval or when the derivative is 0. Since g’(x) = 2x/3, the 
function g is continuous and g’(x) exists on [—1, 1]. The maximum and minimum values 
of g(x) occur at x = —1,x = 0, or x = 1. But g(—1) = 0, g(1) = O," and g(0) = —1/3, 


so an absolute maximum for g(x) on [—1, 1] occurs at x = —1 and x = 1, and an absolute 
minimum at x = 0. 
Moreover 
F 2x 2 
lg (x)| = 3 < 3” for all x € (—1, 1). 


So g satisfies all the hypotheses of Theorem 2.3 and has a unique fixed point in [—1, 1]. m 


For the function in Example 2, the unique fixed point p in the interval [—1, 1] can be 
determined algebraically. If 


p-l 


7 then p’ —3p—1=0, 


P=R8(pP)= 


which, by the quadratic formula, implies, as shown on the left graph in Figure 2.4, that 


p= 53-V1B). 


Note that g also has a unique fixed point p = $(3 + 13) for the interval [3, 4]. 
However, g(4) = 5 and g’(4) = : > 1, so g does not satisfy the hypotheses of Theorem 2.3 


on [3,4]. This demonstrates that the hypotheses of Theorem 2.3 are sufficient to guarantee 
a unique fixed point but are not necessary. (See the graph on the right in Figure 2.5.) 
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Figure 2.5 


Example 3 Show that Theorem 2.3 does not ensure a unique fixed point of g(x) = 3~* on the interval 
[0, 1], even though a unique fixed point on this interval does exist. 


Solution g’(x) = —3-* In3 < 0 on (0, 1], the function g is strictly decreasing on [0, 1]. So 
1 
gl) = 3 < g(x) <1=2(0), for 0<x<1. 


Thus, for x € [0, 1], we have g(x) € [0, 1]. The first part of Theorem 2.3 ensures that there 
is at least one fixed point in [0, 1]. 
However, 


g' (0) = —In3 = —1.098612289, 


so |g’(x)| < 1 on (0, 1), and Theorem 2.3 cannot be used to determine uniqueness. But g is 
always decreasing, and it is clear from Figure 2.6 that the fixed point must be unique. 1m 


Figure 2.6 
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Fixed-Point Iteration 


We cannot explicitly determine the fixed point in Example 3 because we have no way to 
solve for p in the equation p = g(p) = 3°”. We can, however, determine approximations 
to this fixed point to any specified degree of accuracy. We will now consider how this can 
be done. 

To approximate the fixed point of a function g, we choose an initial approximation po 
and generate the sequence { p,}*°.y by letting p, = g(pn-1), for eachn > 1. If the sequence 
converges to p and g is continuous, then 


p= lim p, = lim g(pr-1) = 8 ( lim Pn-1) = g(p), 
n> oo n>oo n—> oo 


and a solution to x = g(x) is obtained. This technique is called fixed-point, or functional 
iteration. The procedure is illustrated in Figure 2.7 and detailed in Algorithm 2.2. 


Figure 2.7 


(Po, P3) 


(Pi, P2) 


(Po, Pr) "A (Po, P2) 
(Po P1) 


(P1;P1) (Pi, Py) 


(Po. Pv) 


Pi P3 P2 Po 


(a) 


Fixed-Point Iteration 


To find a solution to p = g(/p) given an initial approximation po: 


INPUT _ initial approximation po; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 
Step 7 Seti=1. 
Step 2. While i < No do Steps 3-6. 
Step 3 Setp=g(po). (Compute pj.) 


Step 4 If|p—po| < TOL then 
OUTPUT (p); (The procedure was successful.) 
STOP. 


Step5 Seti=i-+l. 
Step 6 Setpp=p. (Update po.) 
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Step 7 OUTPUT (‘The method failed after No iterations, No =’, No); 
3 (The procedure was unsuccessful.) 
STOP. a 


The following illustrates some features of functional iteration. 


Illustration The equation x + 4x” — 10 = 0 has a unique root in [1, 2]. There are many ways to change 
the equation to the fixed-point form x = g(x) using simple algebraic manipulation. For 
example, to obtain the function g described in part (c), we can manipulate the equation 
x? + 4x? — 10 = 0as follows: 


1 1 
4x? = 10—x°, so x = 710-2"), and x= 45(10— x)”. 


To obtain a positive solution, g3(x) is chosen. It is not important for you to derive the 
functions shown here, but you should verify that the fixed point of each is actually a solution 
to the original equation, x* + 4x7 — 10 = 0. 


= a_i gd 2 Ae? 1/2 
(a) x= g(x) =x—-—x° —4x7+ 10 (b) x= eat) = (Sar) 


1 10 1/2 
aa rey @) x= ga) = Gy 


x? + 4x” — 10 


(e) x= g(x) =x— 3x2 4 Bx 


With po = 1.5, Table 2.2 lists the results of the fixed-point iteration for all five choices of g. 


Table 2.2, (a) (b) ©) (d) (e) 

0 1.5 1.5 1.5 15 LA 
1 —0.875 0.8165 1.286953768 1.348399725 1.373333333 
2 6.732 2.9969 1.402540804 1.367376372 1.365262015 
3 —469.7 (—8.65)!/? 1.345458374 1.364957015 1.365230014 
4 1.03 x 108 1.375170253 1.365264748 1.365230013 
5 1.360094193 1.365225594 
6 1.367846968 1.365230576 
7 1.363887004 1.365229942 
8 1.365916734 1.365230022 
9 1.364878217 1.365230012 

10 1.365410062 1.365230014 

15 1.365223680 1.365230013 

20 1.365230236 

25 1.365230006 

30 1.365230013 


The actual root is 1.365230013, as was noted in Example 1 of Section 2.1. Comparing the 
results to the Bisection Algorithm given in that example, it can be seen that excellent results 
have been obtained for choices (c), (d), and (e) (the Bisection method requires 27 iterations 
for this accuracy). It is interesting to note that choice (a) was divergent and that (b) became 
undefined because it involved the square root of a negative number. 
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Although the various functions we have given are fixed-point problems for the same 
root-finding problem, they differ vastly as techniques for approximating the solution to the 
root-finding problem. Their purpose is to illustrate what needs to be answered: 


© Question: How can we find a fixed-point problem that produces a sequence that reliably 
and rapidly converges to a solution to a given root-finding problem? 


The following theorem and its corollary give us some clues concerning the paths we 
should pursue and, perhaps more importantly, some we should reject. 


Theorem 24 (Fixed-Point Theorem) 


Let g € C[a,b] be such that g(x) € [a,b], for all x in [a,b]. Suppose, in addition, that 
g’ exists on (a, b) and that a constant 0 < k < 1 exists with 


le’(x)| < k, forall x € (a,b). 
Then for any number pp in [a, b], the sequence defined by 
Pn =8(Pr-1), n=l, 


converges to the unique fixed point p in [a, b]. a 


Proof Theorem 2.3 implies that a unique point p exists in [a,b] with g(p) = p. Since g 
maps [a, b] into itself, the sequence { p,}"° is defined for all n > 0, and p, € [a, b] for all 


n. Using the fact that |g’(x)| < k and the Mean Value Theorem 1.8, we have, for each n, 
| Pn — Pl = |g(Pn—1) — 8(P)| = 18’ En) Il Pr—-1 — P| S *l Pn-1 — PI, 
where &, € (a,b). Applying this inequality inductively gives 
[Pn — P| < kl Prt — Pl SK’ |Pn-2 — pl < +++ < "| po — pl. (2.4) 

Since 0 < k < 1, we have limy_,.. k” = 0 and 

lim | pn —p| < lim k"| po — p| = 0. 

n—->Oo n—> Co 
Hence { p,}0°., converges to p. _. 8 


Corollary 2.5 If g satisfies the hypotheses of Theorem 2.4, then bounds for the error involved in using p, 
to approximate p are given by 


| Pn — P| < k" max{ po — a,b — po} (2.5) 

and 
IPn—- PIS i ala Pol, forall n> 1. (2.6) 
i 


Proof Because p € [a, b], the first bound follows from Inequality (2.4): 
| Pn — P| = K"| po — pl Sk" max{ po — a,b — po}. 
For n > 1, the procedure used in the proof of Theorem 2.4 implies that 
| Prt — Pnl = |8(Pn) — 8(Pn-V)1 Sl Pn — Poi] S +++ S K"l Pi — Pol: 
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Thus form >n> 1, 
| Pm — Pal = | Pm — Pm-1 + Pm—1 — +++ + Pati — Pal 
S| Pm — Pm-1| + | Pm=1 — Pm-2| + +++ +1 Pati — Pal 
<k""| pi — pol +k" | pi — pol +--+ + "| pi — Pol 
=k"|pi —pol(1+k+h +---+k""').” 


By Theorem 2.3, limynso0 Pm = P, SO 


m—n—1 lo) 
[P= Pal = lim | Pm — Pal S tim k"|pi—pol DY ki <k"\pi — pol DOK. 
i=0 i=0 


But )>°°, k! is a geometric series with ratio k and 0 < k < 1. This sequence converges to 
1/(1 — k), which gives the second bound: 


|P1 Pol. = 8 @ 


Both inequalities in the corollary relate the rate at which { pn}? ) converges to the bound 
k on the first derivative. The rate of convergence depends on the factor k”. The smaller the 
value of k, the faster the convergence, which may be very slow if k is close to 1. 


Illustration Let us reconsider the various fixed-point schemes described in the preceding illustration in 
light of the Fixed-point Theorem 2.4 and its Corollary 2.5. 


(a) For gi(x) = x- x? — 4x? + 10, we have gi(1) = 6 and g;(2) = —12, so g; does 
not map [1,2] into itself. Moreover, g(x) = 1 — 3x” — 8x, so |g/(x)| > 1 for all x 
in [1,2]. Although Theorem 2.4 does not guarantee that the method must fail for this 
choice of g, there is no reason to expect convergence. 

(b) With go(x) = [(10/x) — 4x]'/2, we can see that 82 does not map [1, 2] into [1, 2], and 
the sequence {p,}°, is not defined when po = 1.5. Moreover, there is no interval 
containing p © 1.365 such that |g4(x)| < 1, because |g4(p)| * 3.4. There is no reason 
to expect that this method will converge. 


(c) For the function g3(x) = $(10 — x3)!/2| we have 


3 
23(x) = -7 0 =2°)-? 20 ‘on [1,2], 


so g3 is strictly decreasing on [1,2]. However, |g,(2)| ~ 2.12, so the condition 
|g3(x)| < k < 1 fails on [1,2]. A closer examination of the sequence { p,}°°, starting 
with po = 1.5 shows that it suffices to consider the interval [1, 1.5] instead of [1, 2]. On 
this interval it is still true that g(x) < 0 and g; is strictly decreasing, but, additionally, 


1 < 1.28 © g3(1.5) < 93(x) < g3(1) = 1.5, 


for all x € [1, 1.5]. This shows that g3 maps the interval [1, 1.5] into itself. It is also 
true that |g3(x)| < |g,(1.5)| ~ 0.66 on this interval, so Theorem 2.4 confirms the 
convergence of which we were already aware. 


(d) For g4(x) = (10/(4+.x))!”, we have 


lgax)| = <0.15, forall x e€[1,2]. 


5 5 
za < /10(5)3/2 
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The bound on the magnitude of g’,(x) is much smaller than the bound (found in (c)) 
on the magnitude of g(x), which explains the more rapid convergence using gq. 
(e) The sequence defined by 
(x) x? + 4x? — 10 
x) =x — ———__—_ 
= 3x2 + 8x 


converges much more rapidly than our other choices. In the next sections we will see 
where this choice came from and why it is so effective. 


From what we have seen, 


© Question: How can we find a fixed-point problem that produces a sequence that reliably 
and rapidly converges to a solution to a given root-finding problem? 


might have 


e Answer: Manipulate the root-finding problem into a fixed point problem that satisfies the 
conditions of Fixed-Point Theorem 2.4 and has a derivative that is as small as possible 
near the fixed point. 


In the next sections we will examine this in more detail. 

Maple has the fixed-point algorithm implemented in its NumericalAnalysis package. 
The options for the Bisection method are also available for fixed-point iteration. We will 
show only one option. After accessing the package using with(Student|NumericalAnalysis]): 
we enter the function 


be 2a =10) 
3x2 + 8x 


gi=x 
and Maple returns 
x? + 4x? — 10 
3x? + 8x 
Enter the command 


FixedPointIteration(fixedpointiterator = g, x = 1.5, tolerance = 107°, output = sequence, 
maxiterations = 20) 


and Maple returns 


1.5, 1.373333333, 1.365262015, 1.365230014, 1.365230013 


EXERCISE SET 22 


1. 


Use algebraic manipulation to show that each of the following functions has a fixed point at p precisely 
when f(p) = 0, where f(x) = x4 + 2x? — x —3. 
x+3-—x' ) ue 


a. gi(x) = (3+x— 2x)" b. tn) =( 5 
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x4+3\'? 3x4 + 2x? + 3 
. = d. = 
Cc. 83x) (5) B(x) 4x3 + 4x —1 
2. a. Perform four iterations, if possible, on each of the functions g defined in Exercise 1. Let pp = 1 
and Pn+1 = 8(Pn), forn = 0, 1,2, 3. 


b. Which function do you think gives the best approximation to the solution? 


1/3 


3. The following four methods are proposed to compute 21°/’. Rank them in order, based on their 


apparent speed of convergence, assuming po = 1. 


a pew BOP t IP é. ge 
Pn 21 EOE Be 
_ pt, —21p.41 a 
C. Pn = Pa-1 — "p21 d. Pr = ra 
n-1 n— 


1/5 


4. The following four methods are proposed to compute 7’/°. Rank them in order, based on their apparent 


speed of convergence, assuming po = 1. 


{=~ 4) pad 
Qa Pn = Prt (: + pe viet ') Be Pn = Pn-1 — 
Pr-1 n-1 
5 5 
Ph-i — 7 Ph-i — 7 
c. n = Pn-1 ~ ~p qa d. n = Pn-1 ~ 7 
P. Pn-1 Spi, P Pn-1 12 


5. Usea fixed-point iteration method to determine a solution accurate to within 10-7 for x — 3x23 = 0 
on [1, 2]. Use po = 1. 


6. Usea fixed-point iteration method to determine a solution accurate to within 10~? for x? —x — 1 =0 
on [1, 2]. Use po = 1. 


7. Use Theorem 2.3 to show that g(x) = a + 0.5sin(x/2) has a unique fixed point on [0,27]. Use 
fixed-point iteration to find an approximation to the fixed point that is accurate to within 10-7. Use 
Corollary 2.5 to estimate the number of iterations required to achieve 10~* accuracy, and compare 
this theoretical estimate to the number actually needed. 


8. Use Theorem 2.3 to show that g(x) = 2 has a unique fixed point on [3 1]. Use fixed-point iteration 
to find an approximation to the fixed point accurate to within 10-+. Use Corollary 2.5 to estimate the 
number of iterations required to achieve 10~* accuracy, and compare this theoretical estimate to the 
number actually needed. 


9. Use a fixed-point iteration method to find an approximation to /3 that is accurate to within 107. 
Compare your result and the number of iterations required with the answer obtained in Exercise 12 
of Section 2.1. 


10. Use a fixed-point iteration method to find an approximation to \/25 that is accurate to within 107‘. 
Compare your result and the number of iterations required with the answer obtained in Exercise 13 
of Section 2.1. 


11. For each of the following equations, determine an interval [a, b] on which fixed-point iteration will 
converge. Estimate the number of iterations necessary to obtain approximations accurate to within 
10->, and perform the calculations. 


2-—e 4x7 5 
a. a _ 3. b. x= 3, + 2 
ce. x= (e/3)'7 d. x=5™* 
e x=6~* f. x =0.5(sinx + cos x) 


12. For each of the following equations, use the given interval or determine an interval [a, b] on which 
fixed-point iteration will converge. Estimate the number of iterations necessary to obtain approxima- 
tions accurate to within 10~>, and perform the calculations. 

a. 2+sinx—x=0_ use [2,3] b x3 —2x—-5=0 use [2,3] 
c. 3x7 -e =0 d. x—cosx=0 

13. Find all the zeros of f (x) = x* + 10cos x by using the fixed-point iteration method for an appropriate 

iteration function g. Find the zeros accurate to within 107+. 
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14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


Solutions of Equations in One Variable 


Use a fixed-point iteration method to determine a solution accurate to within 10~* for x = tan x, for 
x in [4,5]. 

Use a fixed-point iteration method to determine a solution accurate to within 10~? for 2 sinx+x = 0 
on [1,2]. Use po = 1. 

Let A be a given positive constant and g(x) = 2x — Ax?. 


a. Show that if fixed-point iteration converges to a nonzero limit, then the limit is p = 1/A, so the 
inverse of a number can be found using only multiplications and subtractions. 


b. Find an interval about 1/A for which fixed-point iteration converges, provided po is in that 
interval. 


Find a function g defined on [0, 1] that satisfies none of the hypotheses of Theorem 2.3 but still has a 

unique fixed point on [0, 1]. 

a. Show that Theorem 2.2 is true if the inequality |g’(x)| < k is replaced by g/(x) < k, for all 
x € (a,b). [Hint: Only uniqueness is in question. ] 

b. Show that Theorem 2.3 may not hold if inequality |g’(x)| < k is replaced by g(x) < k. [Hint: 
Show that g(x) = 1 — x’, for x in [0, 1], provides a counterexample. ] 


a. Use Theorem 2.4 to show that the sequence defined by 


1 
Xn = 5Xn-1 + > forn > 1, 
2 Xn-1 


converges to /2 whenever xy > V2. 
b. Use the fact that 0 < (x) — /2)? whenever xo # /2 to show that if0 < x9 < J2,thenx, > J2. 


Use the results of parts (a) and (b) to show that the sequence in (a) converges to /2 whenever 
Xo > 0. 


a. Show that if A is any positive number, then the sequence defined by 
1 


Xn = FXn-1 + 
2 2Xn-1 


, forn>1, 


converges to VA whenever x9 > 0. 
b. What happens if x) < 0? 


Replace the assumption in Theorem 2.4 that “a positive number k < 1 exists with |g’(x)| < k” with 
“g satisfies a Lipschitz condition on the interval [a, b] with Lipschitz constant L < 1.” (See Exercise 
27, Section 1.1.) Show that the conclusions of this theorem are still valid. 


Suppose that g is continuously differentiable on some interval (c,d) that contains the fixed point 
p of g. Show that if |g’(p)| < 1, then there exists a 6 > O such that if |p) — p| < 6, then the 
fixed-point iteration converges. 


An object falling vertically through the air is subjected to viscous resistance as well as to the force 
of gravity. Assume that an object with mass m is dropped from a height so and that the height of the 
object after t seconds is 


2 
= mg ms —kt/m 
May - t+ Ue )s 


where g = 32.17 ft/s? and k represents the coefficient of air resistance in Ib-s/ft. Suppose so = 300 ft, 
m = 0.25 lb, and k = 0.1 Ib-s/ft. Find, to within 0.01 s, the time it takes this quarter-pounder to hit the 
ground. 


Let g € C'[a, b] and p be in (a, b) with g(p) = p and |g'(p)| > 1. Show that there exists a5 > 0 such 
thatifO < |po — p| < 46, then |pp — p| < |p; — p| . Thus, no matter how close the initial approximation 
Po is to p, the next iterate p, is farther away, so the fixed-point iteration does not converge if py 4 p. 
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| 2.3 Newton's Method and Its Extensions 


Isaac Newton (1641-1727) was 
one of the most brilliant scientists 
of all time. The late 17th century 
was a vibrant period for science 
and mathematics and Newton’s 
work touched nearly every aspect 
of mathematics. His method for 
solving was introduced to find 

a root of the equation 

y> — 2y —5 = 0. Although he 
demonstrated the method only for 
polynomials, it is clear that he 
realized its broader applications. 


Joseph Raphson (1648-1715) 
gave a description of the method 
attributed to Isaac Newton in 
1690, acknowledging Newton as 
the source of the discovery. 
Neither Newton nor Raphson 
explicitly used the derivative in 
their description since both 
considered only polynomials. 
Other mathematicians, 
particularly James Gregory 
(1636-1675), were aware of the 
underlying process at or before 
this time. 


Newton’s (or the Newton-Raphson) method is one of the most powerful and well-known 
numerical methods for solving a root-finding problem. There are many ways of introducing 
Newton’s method. 


Newton's Method 


If we only want an algorithm, we can consider the technique graphically, as is often done in 
calculus. Another possibility is to derive Newton’s method as a technique to obtain faster 
convergence than offered by other types of functional iteration, as is done in Section 2.4. A 
third means of introducing Newton’s method, which is discussed next, is based on Taylor 
polynomials. We will see there that this particular derivation produces not only the method, 
but also a bound for the error of the approximation. 

Suppose that f € C?[a, b]. Let po € [a, b] be an approximation to p such that f’(po) 4 
0 and | p — po| is “small.” Consider the first Taylor polynomial for f(x) expanded about po 
and evaluated at x = p. 


(p — po)” 


f (Pp) = f (po) + (p — po) f' (po) + TIE (P)), 


where &(p) lies between p and po. Since f(p) = 0, this equation gives 


(p — po)” 


0 = f(po) + (p— po) f' (po) + a SFE). 


Newton’s method is derived by assuming that since | p — po| is small, the term involving 
(p- Po)” is much smaller, so 


0 f(po) + (p— po) f (po). 


Solving for p gives 


f (Po) 
f'(Po) 


P* Po 


This sets the stage for Newton’s method, which starts with an initial approximation po 
and generates the sequence { p,}°°.9, by 


ff (Pn-1) 


f 1. 2.7 
ria) i : , 


Pn = Pn-1 — 


Figure 2.8 on page 68 illustrates how the approximations are obtained using successive 
tangents. (Also see Exercise 15.) Starting with the initial approximation po, the approx- 
imation p, is the x-intercept of the tangent line to the graph of f at (po, f(po)). The 
approximation p> is the x-intercept of the tangent line to the graph of f at (pi, f(p1)) and 
so on. Algorithm 2.3 follows this procedure. 
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Figure 2.8 


Slope f"(p1) 


(Pi. f(Py)) 


Slope f"(po) 


(Po, f (Po) 


Newton’s 


To find a solution to f(x) = 0 given an initial approximation po: 


INPUT _ initial approximation po; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 
Step 7 Seti=1. 
Step 2. While i < No do Steps 3-6. 
Step 3 Setp =po — f(Po)/f'(po). (Compute p;.) 


Step 4 If|p— pol < TOL then 
OUTPUT (p);_ (The procedure was successful.) 
STOP. 


Step5 Seti=i+. 
Step 6 Setpo=p. (Update po.) 


Step 7 OUTPUT (‘The method failed after No iterations, No =’, No); 
(The procedure was unsuccessful.) 
STOP. rT] 


The stopping-technique inequalities given with the Bisection method are applicable to 
Newton’s method. That is, select a tolerance e > 0, and construct p;,...py until 


| Py — Py-i| < &, (2.8) 
NPN 5 eb G) (2.9) 
| Pn| 
or 
|f(pn)| < &. (2.10) 
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Example 1 


Figure 2.9 


Note that the variable in the 
trigonometric function is in 
radian measure, not degrees. This 
will always be the case unless 
specified otherwise. 


Table 2.3 
n Pn 
0 0.785398 1635 
1 0.7071067810 
2 0.7602445972 
3 0.7246674808 
4 0.7487198858 
5 0.7325608446 
6 0.7434642113 
7 0.7361282565 
Table 2.4 

Newton’s Method 
n Pn 
0 0.785398 1635 
1 0.7395361337 
2 0.7390851781 
3 0.739085 1332 
4 0.739085 1332 


2.3 Newton's Method and Its Extensions 69 


A form of Inequality (2.8) is used in Step 4 of Algorithm 2.3. Note that none of the inequal- 
ities (2.8), (2.9), or (2.10) give precise information about the actual error | py — p|. (See 
Exercises 16 and 17 in Section 2.1.) 

Newton’s method is a functional iteration technique with p, = g(Pn-1), for which 


f (Pn-1) 
f'(n-1) 


In fact, this is the functional iteration technique that was used to give the rapid convergence 
we saw in column (e) of Table 2.2 in Section 2.2. 

It is clear from Equation (2.7) that Newton’s method cannot be continued if f’(pp_1) = 
0 for some n. In fact, we will see that the method is most effective when f’ is bounded away 
from zero near p. 


8(Pn—-1) = Pn—-1 — , forn>1. (2.11) 


Consider the function f(x) = cos x—x = 0. Approximate a root of f using (a) a fixed-point 
method, and (b) Newton’s Method 


Solution (a) A solution to this root-finding problem is also a solution to the fixed-point 
problem x = cosx, and the graph in Figure 2.9 implies that a single fixed-point p lies in 
[0, 2/2]. 


y= cos x 


Table 2.3 shows the results of fixed-point iteration with pp = 2/4. The best we could 
conclude from these results is that p ~ 0.74. 

(b) To apply Newton’s method to this problem we need f’(x) = — sinx — 1. Starting 
again with po = 2/4, we generate the sequence defined, for n > 1, by 
St (Pn-1) = COS Pn—-1 — Pn-1 


7 — 


Pn = Pn-1 — = Pn-1 ; . 
7 " fD1 —sin pn») — 1 


This gives the approximations in Table 2.4. An excellent approximation is obtained with 
n = 3. Because of the agreement of p3 and p4 we could reasonably expect this result to be 
accurate to the places listed. a 


Convergence using Newton’s Method 


Example 1 shows that Newton’s method can provide extremely accurate approximations 
with very few iterations. For that example, only one iteration of Newton’s method was 
needed to give better accuracy than 7 iterations of the fixed-point method. It is now time to 
examine Newton’s method more carefully to discover why it is so effective. 
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Theorem 2.6 


Solutions of Equations in One Variable 


The Taylor series derivation of Newton’s method at the beginning of the section points 
out the importance of an accurate initial approximation. The crucial assumption is that the 
term involving (p — po)? is, by comparison with | p — po|, so small that it can be deleted. 
This will clearly be false unless po is a good approximation to p. If po is not sufficiently 
close to the actual root, there is little reason to suspect that Newton’s method will converge 
to the root. However, in some instances, even poor initial approximations will produce 
convergence. (Exercises 20 and 21 illustrate some of these possibilities.) 

The following convergence theorem for Newton’s method illustrates the theoretical 
importance of the choice of po. 


Let f € C?[a,b]. If p € (a,b) is such that f(p) = 0 and f’(p) # 0, then there exists a 
5 > 0 such that Newton’s method generates a sequence { p,}°° , converging to p for any 


initial approximation po € [p — 6,p + 4]. a 


Proof The proof is based on analyzing Newton’s method as the functional iteration scheme 
Pn = 8(Pn-1), for n > 1, with 


_ f@) 
fe) 
Let k be in (0, 1). We first find an interval [p — 5, p+ 4] that g maps into itself and for which 
|g’ (x)| < k, for all x € (p —6,p +5). 
Since f’ is continuous and f’(p) 4 0, part (a) of Exercise 29 in Section 1.1 implies 
that there exists a 5; > 0, such that f’(x) 4 0 for x € [p — 6;,p + 61] C [a,b]. Thus g is 
defined and continuous on [p — 6;,p + 6,]. Also 


POLO —fOFO) _ FOF") 
Lf’ @y? Lf @P ° 


for x € [p — 6;,p + 5;], and, since f € C?[a, b], we have g € C![p — 6),p + 4)]. 
By assumption, f(p) = 0, so 


g(x) =x 


gay=1 


, FOO) 
ee 
8) = Tryp 


Since g’ is continuous and 0 < k < 1, part (b) of Exercise 29 in Section 1.1 implies that 
there exists a 6, with 0 < 6 < 6,, and 


le'(x)| <k, forall x e[p—6,p+ 6]. 


It remains to show that g maps [p — 6, p + 4] into [p — 6,p + 4]. Ifx € [p—6,p+ 4], 
the Mean Value Theorem implies that for some number € between x and p, |g(x) — g(p)| = 
Ig’(E)|lx — p|. So 


lg(x) — p| = |g(x) — g(p)| = Ig’) |x — pl < klx — p| < |x—pl. 


Since x € [p — 6,p + 4], it follows that |x — p| < 6 and that |g(x) — p| < 6. Hence, g maps 
[p — 6,p + 4] into [p — 6,p + 4]. 

All the hypotheses of the Fixed-Point Theorem 2.4 are now satisfied, so the sequence 
{ Pn}2,, defined by 


f (Pn-1) 
f'(n-1) 


converges to p for any po € [p — 6,p + 4]. _. 8 


Pr = &(Pn—-1) = Pn-1 — ; forn > 1, 
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The word secant is derived from 
the Latin word secan, which 
means to cut. The secant method 
uses a secant line, a line joining 
two points that cut the curve, to 
approximate a root. 


Figure 2.10 


2.3. Newton's Method and Its Extensions 71 


Theorem 2.6 states that, under reasonable assumptions, Newton’s method converges 
provided a sufficiently accurate initial approximation is chosen. It also implies that the con- 
stant k that bounds the derivative of g, and, consequently, indicates the speed of convergence 
of the method, decreases to 0 as the procedure continues. This result is important for the 
theory of Newton’s method, but it is seldom applied in practice because it does not tell us 
how to determine 6. 

In a practical application, an initial approximation is selected and successive approx- 
imations are generated by Newton’s method. These will generally either converge quickly 
to the root, or it will be clear that convergence is unlikely. 


The Secant Method 


Newton’s method is an extremely powerful technique, but it has a major weakness: the need 
to know the value of the derivative of f at each approximation. Frequently, f’ (x) is far more 
difficult and needs more arithmetic operations to calculate than f(x). 

To circumvent the problem of the derivative evaluation in Newton’s method, we intro- 
duce a slight variation. By definition, 


_  S&) — fF Pr-1) 
"(Pa-1) = lim ————~—.. 
fF (Pn-1) ae. an 
If py—2z is close to p,_1, then 
FS (Pn-2) _ FS (Pn-1) _ Ff (Pn-1) _ FS (Pn-2) 
Pn-2 — Pn-1 Pn—-1 — Pn-2 


Using this approximation for f’(p,_1) in Newton’s formula gives 


f'(Pn-1) x 


SF (Pn—1) (Pn-1 — Pn-2) 
Ff (Pn—1) — f (Pn-2) : 


Pn = Pn-1 (2.12) 


This technique is called the Secant method and is presented in Algorithm 2.4. (See 
Figure 2.10.) Starting with the two initial approximations po and p;, the approximation p> is 
the x-intercept of the line joining (po, f(po)) and (pi, f (p1)). The approximation p3 is the 
x-intercept of the line joining (p1, f(p1)) and (p2, f(p2)), and so on. Note that only one 
function evaluation is needed per step for the Secant method after pz has been determined. 
In contrast, each step of Newton’s method requires an evaluation of both the function and 
its derivative. 
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CHAPTER 2 


Example 2 


Table 2.5 
Secant 
n Pn 
0 0.5 
1 0.785398 1635 
2: 0.736384 1388 
3 0.739058 1392 
4 0.739085 1493 
5 0.739085 1332 
Newton 
n Pn 
(0) 0.785398 1635 
1 0.7395361337 
2 0.7390851781 
3 0.739085 1332 
4 0.7390851332 


Solutions of Equations in One Variable 


Secant 


To find a solution to f(x) = 0 given initial approximations po and p: 


INPUT _ initial approximations po, p1; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 
Step 1 Seti=2,; 


qo = f (po); 
a= fpr. 


Step 2. While i < No do Steps 3-6. 
Step 3 Setp=pi — q1(p1 — Po)/(qi — Go). (Compute pj.) 


Step 4 If|p—p,| < TOL then 
OUTPUT (p);_ (The procedure was successful.) 
STOP. 


Step 5 Seti=i+1. 
Step 6 Setpo=pi; (Update po, qo.P1,41-) 


490 = 713 
Pi=DP; 
a= fp). 


Step 7 OUTPUT (‘The method failed after No iterations, No =’, No); 
(The procedure was unsuccessful.) 
STOP. a 


The next example involves a problem considered in Example 1, where we used New- 
ton’s method with po = 1/4. 


Use the Secant method to find a solution to x = cosx, and compare the approximations 
with those given in Example 1 which applied Newton’s method. 


Solution In Example | we compared fixed-point iteration and Newton’s method starting 
with the initial approximation pp = 2/4. For the Secant method we need two initial ap- 
proximations. Suppose we use po = 0.5 and p, = 2/4. Succeeding approximations are 
generated by the formula 


(Pa—1 — Pn—2)(COS Pn—1 — Pn—1) 


, forn>2. 
(COS Pp—1 — Pn—1) — (COS Pn—2 — Pn—2) 


Pn = Pn-1 — 
These give the results in Table 2.5. a 


Comparing the results in Table 2.5 from the Secant method and Newton’s method, we 
see that the Secant method approximation ps is accurate to the tenth decimal place, whereas 
Newton’s method obtained this accuracy by p3. For this example, the convergence of the 
Secant method is much faster than functional iteration but slightly slower than Newton’s 
method. This is generally the case. (See Exercise 14 of Section 2.4.) 

Newton’s method or the Secant method is often used to refine an answer obtained by 
another technique, such as the Bisection method, since these methods require good first 
approximations but generally give rapid convergence. 
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The term Regula Falsi, literally a 
false rule or false position, refers 
to a technique that uses results 
that are known to be false, but in 
some specific manner, to obtain 
convergence to a true result. False 
position problems can be found 
on the Rhind papyrus, which 
dates from about 1650 B.c.E. 


Figure 2.11 
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The Method of False Position 


Each successive pair of approximations in the Bisection method brackets a root p of the 
equation; that is, for each positive integer n, a root lies between a, and b,,. This implies that, 
for each n, the Bisection method iterations satisfy 


1 
no ay n — Dal, 
[Pu — Pl < 5ldn — bal 


which provides an easily calculated error bound for the approximations. 

Root bracketing is not guaranteed for either Newton’s method or the Secant method. 
In Example 1, Newton’s method was applied to f(x) = cos x — x, and an approximate root 
was found to be 0.7390851332. Table 2.5 shows that this root is not bracketed by either po 
and p; or p; and pz. The Secant method approximations for this problem are also given in 
Table 2.5. In this case the initial approximations po and p, bracket the root, but the pair of 
approximations p3 and pz, fail to do so. 

The method of False Position (also called Regula Falsi) generates approximations 
in the same manner as the Secant method, but it includes a test to ensure that the root is 
always bracketed between successive iterations. Although it is not a method we generally 
recommend, it illustrates how bracketing can be incorporated. 

First choose initial approximations pp and p; with f(po) - f(p1) < 0. The approxi- 
mation p2 is chosen in the same manner as in the Secant method, as the x-intercept of the 
line joining (po, f (po)) and (p1, f (p1)). To decide which secant line to use to compute p3, 


consider f(p2)- f(p1), or more correctly sgn f(p2)- sgn f(p1). 


e Ifsgn f(p2)-sgn f(p1) < 0, then p; and pz bracket a root. Choose p3 as the x-intercept 
of the line joining (p;, f(p1)) and (p2, f(p2)). 


© If not, choose p3 as the x-intercept of the line joining (po, f(po)) and (p2, f(p2)), and 
then interchange the indices on po and p;. 


In a similar manner, once p3 is found, the sign of f(p3) - f(p2) determines whether we 
use p2 and p3 or p3 and p; to compute py. In the latter case a relabeling of pz and p, is 
performed. The relabeling ensures that the root is bracketed between successive iterations. 
The process is described in Algorithm 2.5, and Figure 2.11 shows how the iterations can 
differ from those of the Secant method. In this illustration, the first three approximations 
are the same, but the fourth approximations differ. 


Secant Method Method of False Position 


y=f) y=f) 
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False Position 


To find a solution to f(x) = 0 given the continuous function f on the interval [ po, p1] 
where f (po) and f(p;) have opposite signs: 


INPUT | initial approximations po, p1; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 
Step 1 Seti=2,; 
qo = f (Po); 
qi = f(pi). 
Step 2. While i < No do Steps 3-7. 
Step 3 Set p =p) — 91(P1 — Po)/(G1 — Go). (Compute p;.) 
Step 4 If|p—p.| < TOL then 
OUTPUT (p);_ (The procedure was successful.) 
STOP. 
Step 5 Seti=i+1,; 
q= f(p). 
Step 6 Ifq-q, < O then set pp = pi; 
90 = 41. 
Step 7 Setp, =p; 
1=4.- 


Step 8 OUTPUT (‘Method failed after No iterations, No =’, No); 
(The procedure unsuccessful.) 
STOP. a 


Example 3 Use the method of False Position to find a solution to x = cosx, and compare the approx- 
imations with those given in Example | which applied fixed-point iteration and Newton’s 
method, and to those found in Example 2 which applied the Secant method. 

Solution To make a reasonable comparison we will use the same initial approximations as 

in the Secant method, that is, p) = 0.5 and p; = 2/4. Table 2.6 shows the results of the 

method of False Position applied to f(x) = cos x —x together with those we obtained using 

the Secant and Newton’s methods. Notice that the False Position and Secant approximations 

agree through p3 and that the method of False Position requires an additional iteration to 

obtain the same accuracy as the Secant method. a 
Table 2.6 False Position Secant Newton 

n Pn Pn Pn 

0 0.5 0.5 0.785398 1635 

1 0.7853981635 0.785398 1635 0.7395361337 

2 0.736384 1388 0.7363841388 0.7390851781 

3 0.739058 1392 0.7390581392 0.739085 1332 

4 0.7390848638 0.739085 1493 0.739085 1332 

5 0.739085 1305 0.739085 1332 

6 0.739085 1332 
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The added insurance of the method of False Position commonly requires more calcula- 
tion than the Secant method, just as the simplification that the Secant method provides over 
Newton’s method usually comes at the expense of additional iterations. Further examples 
of the positive and negative features of these methods can be seen by working Exercises 17 
and 18. 

Maple has Newton’s method, the Secant method, and the method of False Position 
implemented in its NumericalAnalysis package. The options that were available for the 
Bisection method are also available for these techniques. For example, to generate the 
results in Tables 2.4, 2.5, and 2.6 we could use the commands 


with(Student(NumericalAnalysis]) 


f :=cos(x) —x 
a 
Newton ( f.x= 70’ tolerance = 10° *, output = sequence, maxiterations = 20) 
4 
Secant (fx — [os. mal , tolerance = 1078, output = sequence, maxiterations = 20) 
and 


4 
FalsePosition ( x= [os. mal , tolerance = 10°°, output = sequence, maxiterations=20) 


EXERCISE SET 23 


1. 


10. 
11. 


Let f(x) = x? — 6 and pp = 1. Use Newton’s method to find po. 

Let f(x) = —x? — cos.x and Po = —1. Use Newton’s method to find p2. Could po = 0 be used? 
Let f(x) = x” — 6. With po = 3 and p; = 2, find ps. 

a. Use the Secant method. 

b. Use the method of False Position. 

c. Which of a. or b. is closer to 6? 

Let f(x) = —x3 — cos.x. With pp = —1 and p; = 0, find p3. 


a. Use the Secant method. b. Use the method of False Position. 
Use Newton’s method to find solutions accurate to within 10~‘ for the following problems. 
a x» —2x7-5=0, [1,4] b. x3+3x7-1=0, [-3,-2] 

ec x—cosx=0, [0,7/2] d. x—0.8—0.2sinx=0, [0,7/2] 


Use Newton’s method to find solutions accurate to within 10~> for the following problems. 
a e°+2%+42cosx—6=0 forl<x<2 

b. In@—1)+cosx—1)=0 forl.3<x<2 

ce. 2xcos2x—(x—2)?=0 for2<x<3and3<x<4 
d. (x—2)?—-Inx=0 forl<x<2ande<x<4 
e 
f. 


e—3x°=0 for0<x<land3<x<5 


snx—-e*=0 forO0<x<13<x<4and6<x<7 


Repeat Exercise 5 using the Secant method. 

Repeat Exercise 6 using the Secant method. 

Repeat Exercise 5 using the method of False Position. 

Repeat Exercise 6 using the method of False Position. 

Use all three methods in this Section to find solutions to within 10~> for the following problems. 
a. 3xe°=0 forl<x<2 


b. 2x+3cosx—e*=0 for0<x<1 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Solutions of Equations in One Variable 


Use all three methods in this Section to find solutions to within 10~’ for the following problems. 

a x°—4x+4-Inx=0 forl <x <2andfor2<x<4 

b x+1-—2sinnx=0 for0<x<1/2andfor1/2<x<1 

Use Newton’s method to approximate, to within 10~*, the value of x that produces the point on the 
graph of y = x? that is closest to (1, 0). [Hint: Minimize [d(x)]?, where d(x) represents the distance 
from (x,x”) to (1, 0).] 

Use Newton’s method to approximate, to within 10~*, the value of x that produces the point on the 
graph of y = 1/x that is closest to (2, 1). 

The following describes Newton’s method graphically: Suppose that f’(x) exists on [a,b] and that 
f'(x) # 0on [a, b]. Further, suppose there exists one p € [a,b] such that f(p) = 0, and let po € [a, b] 
be arbitrary. Let p, be the point at which the tangent line to f at (po, f(po)) crosses the x-axis. For 
each n > 1, let p, be the x-intercept of the line tangent to f at (py_1, f (Pn_1)). Derive the formula 
describing this method. 


Use Newton’s method to solve the equation 


duped aig oes, we 
= x xsin COS 2X, Wil = a 
2°4 "3 Bes 


Iterate using Newton’s method until an accuracy of 10~> is obtained. Explain why the result seems 


unusual for Newton’s method. Also, solve the equation with po = 5z and po = 10z. 


The fourth-degree polynomial 
Ff () = 230x* + 18x° + 9x7 — 221x - 9 


has two real zeros, one in [—1, 0] and the other in [0, 1]. Attempt to approximate these zeros to within 
10-6 using the 


a. Method of False Position 
b. Secant method 
c. Newton’s method 


Use the endpoints of each interval as the initial approximations in (a) and (b) and the midpoints as 
the initial approximation in (c). 


The function f(x) = tanax — 6 has a zero at (1/z) arctan6 © 0.447431543. Let pp = O and 
Pi = 0.48, and use ten iterations of each of the following methods to approximate this root. Which 
method is most successful and why? 


a. _Bisection method 
b. Method of False Position 
c. Secant method 


The iteration equation for the Secant method can be written in the simpler form 


= FS (Pn-1)Pn-2 _ FS (Pn—2)Pn-1 
Sf (Pn-1) _ S (Pn-2) 


n 


Explain why, in general, this iteration equation is likely to be less accurate than the one given in 
Algorithm 2.4. 


The equation x” — 10 cos x = 0 has two solutions, +1.3793646. Use Newton’s method to approximate 
the solutions to within 10~> with the following values of po. 


a Ppo= —100 b. Po= —50 Cc P= —25 
d. Po= 25 eQ P= 50 f. P= 100 


The equation 4x? — e* — e~* = 0 has two positive solutions x, and x. Use Newton’s method to 
approximate the solution to within 10~> with the following values of po. 
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a Po= —10 b. P= —5 Cc Po= —3 
d. Po=-l e. Po =0 f. Po=l 
2. Po= 3 h. Po= 5 i. Po = 10 


22. Use Maple to determine how many iterations of Newton’s method with po = 2/4 are needed to find 
aroot of f(x) = cosx — x to within 107-!. 


23. The function described by f(x) = In(x? + 1) — e°** cos zx has an infinite number of zeros. 
a. Determine, within 10~°, the only negative zero. 
b. Determine, within 10~°, the four smallest positive zeros. 


c. Determine a reasonable initial approximation to find the nth smallest positive zero of f. [Hint: 
Sketch an approximate graph of f.] 


d. Use part (c) to determine, within 10~°, the 25th smallest positive zero of f. 


24. Find an approximation for A, accurate to within 10~*, for the population equation 


,. 4 0, 
1,564,000 = 1,000,000e* + ae — 1), 


discussed in the introduction to this chapter. Use this value to predict the population at the end of the 
second year, assuming that the immigration rate during this year remains at 435,000 individuals per 
year. 

25. The sum of two numbers is 20. If each number is added to its square root, the product of the two sums 
is 155.55. Determine the two numbers to within 10~*. 


26. The accumulated value of a savings account based on regular periodic payments can be determined 
from the annuity due equation, 
P 


A=-—[(U+i)" - 1]. 


i 
In this equation, A is the amount in the account, P is the amount regularly deposited, and i is the rate 
of interest per period for the n deposit periods. An engineer would like to have a savings account 
valued at $750,000 upon retirement in 20 years and can afford to put $1500 per month toward this 
goal. What is the minimal interest rate at which this amount can be invested, assuming that the interest 
is compounded monthly? 


27. Problems involving the amount of money required to pay off a mortgage over a fixed period of time 
involve the formula 


P fe 
herd) ") 


known as an ordinary annuity equation. In this equation, A is the amount of the mortgage, P is the 
amount of each payment, and / is the interest rate per period for the n payment periods. Suppose that a 
30-year home mortgage in the amount of $135,000 is needed and that the borrower can afford house 
payments of at most $1000 per month. What is the maximal interest rate the borrower can afford to 
pay? 

28. A drug administered to a patient produces a concentration in the blood stream given by c(t) = Ate 
milligrams per milliliter, t hours after A units have been injected. The maximum safe concentration 
is 1 mg/mL. 

a. What amount should be injected to reach this maximum safe concentration, and when does this 
maximum occur? 


—t/3 


b. An additional amount of this drug is to be administered to the patient after the concentration falls 
to 0.25 mg/mL. Determine, to the nearest minute, when this second injection should be given. 


c. Assume that the concentration from consecutive injections is additive and that 75% of the amount 
originally injected is administered in the second injection. When is it time for the third injection? 


29. Let f(x) = 3*4! — 7.57, 
a. Use the Maple commands solve and fsolve to try to find all roots of f. 
b. Plot f(x) to find initial approximations to roots of f. 
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30. 
31. 


32. 


33. 


34. 


Solutions of Equations in One Variable 


c. Use Newton’s method to find roots of f to within 107'°. 

d. Find the exact solutions of f(x) = 0 without using Maple. 

Repeat Exercise 29 using f(x) = 8 pe, 

The logistic population growth model is described by an equation of the form 


po =! 


1l—ce-’ 

where P,,c, and k > 0 are constants, and P(t) is the population at time ¢. P;, represents the limiting 
value of the population since lim,_,.. P(t) = Py. Use the census data for the years 1950, 1960, and 
1970 listed in the table on page 105 to determine the constants P,, c, and k for a logistic growth model. 
Use the logistic model to predict the population of the United States in 1980 and in 2010, assuming 
t = 0 at 1950. Compare the 1980 prediction to the actual value. 

The Gompertz population growth model is described by 


— 


P(t) =P,e 


where P;,c, and k > 0 are constants, and P(t) is the population at time t. Repeat Exercise 31 using 
the Gompertz growth model in place of the logistic model. 


Player A will shut out (win by a score of 21-0) player B in a game of racquetball with probability 


l+p p 21 
eS ae 
2 l-p+p 


where p denotes the probability A will win any specific rally (independent of the server). (See 
[Keller, J], p. 267.) Determine, to within 10~?, the minimal value of p that will ensure that A will shut 
out B in at least half the matches they play. 
In the design of all-terrain vehicles, it is necessary to consider the failure of the vehicle when attempting 
to negotiate two types of obstacles. One type of failure is called hang-up failure and occurs when the 
vehicle attempts to cross an obstacle that causes the bottom of the vehicle to touch the ground. The 
other type of failure is called nose-in failure and occurs when the vehicle descends into a ditch and 
its nose touches the ground. 

The accompanying figure, adapted from [Bek], shows the components associated with the nose- 
in failure of a vehicle. In that reference it is shown that the maximum angle a that can be negotiated by 
a vehicle when f is the maximum angle at which hang-up failure does not occur satisfies the equation 


Asina cosa + Bsin’ a — Ccosa — Esina = 0, 
where 
A=lIsinB,;, B= IlcosB;, C= (h+0.5D) sin B, — 0.5D tan B,, 
and E =(h+0.5D) cos B; — 0.5D. 


a. It is stated that when / = 89 in., h = 49 in., D=55 in., and £; = 11.5°, angle @ is approximately 
33°. Verify this result. 


b. Find a for the situation when /, h, and f; are the same as in part (a) but D = 30 in. 
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| 2.4 Error Analysis for Iterative Methods 


In this section we investigate the order of convergence of functional iteration schemes and, 
as a means of obtaining rapid convergence, rediscover Newton’s method. We also consider 
ways of accelerating the convergence of Newton’s method in special circumstances. First, 
however, we need a new procedure for measuring how rapidly a sequence converges. 


Order of Convergence 


Definition 2.7 Suppose { p,}P°., is a sequence that converges to p, with p, # p for all n. If positive constants 
A and @ exist with 


tim Pati a Pl _, 
noo | Pn _ p\* 
then { p,}°°) converges to p of order w, with asymptotic error constant 1. a 


An iterative technique of the form p,, = g(Pn_1) is said to be of order a if the sequence 
{ Pu}p2.9 converges to the solution p = g(p) of order a. 

In general, a sequence with a high order of convergence converges more rapidly than a 
sequence with a lower order. The asymptotic constant affects the speed of convergence but 
not to the extent of the order. Two cases of order are given special attention. 


@) Ifa =1 (anda < 1), the sequence is linearly convergent. 
(ii) Ifa = 2, the sequence is quadratically convergent. 


The next illustration compares a linearly convergent sequence to one that is quadrati- 
cally convergent. It shows why we try to find methods that produce higher-order convergent 
sequences. 


Illustration Suppose that { p,,}°°) is linearly convergent to 0 with 


lim [Pattl _ 9.5 


noo | Pnl 


and that { p,}f2.9 is quadratically convergent to 0 with the same asymptotic error constant, 


[Pn+1| =05. 


ro Pal? 


For simplicity we assume that for each n we have 


| Pn+tl wee and Eee ~ 05. 
| Pal IPnl 


For the linearly convergent scheme, this means that 
| Pu — O| = | Pal © 0.5] Pui] © (0-5)"| Pn—2] © «++ 0.5)" | Pols 
whereas the quadratically convergent procedure has 
lin — 0 = [inl © 0.5\>n—1? © O.5)[0.5/>n-2l? 1 = 0.5)" lpn-2l* 
= (0.5)°[(0.5)|>n—sl'T? = 0.5)" Pn-sl* 
A 2 (0.5) 7" pol”. 
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Table 2.7 


Theorem 2.8 


Solutions of Equations in One Variable 


Table 2.7 illustrates the relative speed of convergence of the sequences to 0 if | po| = |po| = 1. 


Linear Convergence Quadratic Convergence 
Sequence { p,}P°9 Sequence { pn}? 

n (0.5)" (0.5)?! 

1 5.0000 x 107! 5.0000 x 107! 
2 2.5000 x 107! 1.2500 x 107! 
3 1.2500 x 107! 7.8125 x 107 
4 6.2500 x 107? 3.0518 x 10-> 
5 3.1250 x 10-7 4.6566 x 107!° 
6 1.5625 x 1077 1.0842 x 107!° 
7 7.8125 x 1073 5.8775 x 107° 


The quadratically convergent sequence is within 10~** of 0 by the seventh term. At least 
126 terms are needed to ensure this accuracy for the linearly convergent sequence. 


Quadratically convergent sequences are expected to converge much quicker than those 
that converge only linearly, but the next result implies that an arbitrary technique that 
generates a convergent sequences does so only linearly. 


Let g € C[a, b] be such that g(x) € [a,b], for all x € [a,b]. Suppose, in addition, that g’ is 
continuous on (a, b) and a positive constant k < 1 exists with 


lo'(x)| <k, for all x € (a,b). 
If g’(p) # 0, then for any number po # p in [a, b], the sequence 
Pn = 8(Pn-1), forn > 1, 
converges only linearly to the unique fixed point p in [a, b]. a 
Proof We know from the Fixed-Point Theorem 2.4 in Section 2.2 that the sequence con- 


verges to p. Since g’ exists on (a, b), we can apply the Mean Value Theorem to g to show 
that for any n, 


Pnti — P = 8(Pn) — 8(P) = 8 (En) (Pn — P)s 


where é, is between p, and p. Since { p,}°2.) converges to p, we also have {&, }°° 


to p. Since g’ is continuous on (a, b), we have 


9 converging 


lim g'(&,) = g'(p). 
n—> Oo 
Thus 


li Pn+1 —P _ 
im ~—— = 
n>0o Pn — P 


= |g'(p)l- 


lim ¢G,) = e(p) and tim Pat! —Ph 
n—- Oo 


re | Pn — pl 


Hence, if g’(p) 4 0, fixed-point iteration exhibits linear convergence with asymptotic error 
constant |g’(p)|. = = 6 
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Theorem 2.8 implies that higher-order convergence for fixed-point methods of the form 
g(p) = pcan occur only when g’(p) = 0. The next result describes additional conditions 
that ensure the quadratic convergence we seek. 


Theorem 2.9 Let p be a solution of the equation x = g(x). Suppose that 9’(p) = 0 and g” is continuous 
with |g”(x)| < M on an open interval J containing p. Then there exists a 6 > 0 such that, 
for po € [p — 5,p + 4], the sequence defined by p, = g(Pn_1), when n > 1, converges at 
least quadratically to p. Moreover, for sufficiently large values of n, 


M 2 
[Putt — Pl < > 1 Pn — Pl . 7 


Proof Choose k in (0, 1) andé > O such that on the interval [p— 6, p+], contained in /, we 

have |g9'(x)| < k and g” continuous. Since |g’(x)| < k < 1, the argument used in the proof 

of Theorem 2.6 in Section 2.3 shows that the terms of the sequence { p,}°2.) are contained 

in [p — 6,p + 4]. Expanding g(x) in a linear Taylor polynomial for x € [p — 6,p + 6] gives 
g’(&) 


g(x) = g(p) + 8'(p)(x — p) + ml py. 


where & lies between x and p. The hypotheses g(p) = p and g'(p) = 0 imply that 


g’&) 
g(x) = p+ = —(e — py’. 
In particular, when x = p,, 
8 En) 
Pri = 8(Pn) = P+ (Pn = PY’> 
with &, between p, and p. Thus, 
8" (En) 


Pn+i1 p= 5 (Pn py 


Since |g’(x)| < k < 1 on[p—6,p+6] and g maps [p — 5, p + 4] into itself, it follows from 
the Fixed-Point Theorem that { p,}°2.) converges to p. But &, is between p and p, for each 
n, 80 {&,}° ) also converges to p, and 


kim Patt Pl lg” (p)| 
1m 2 


noo | Dn — pl? 2 


This result implies that the sequence { p,}°°,, is quadratically convergent if g’(p) 4 0 and 
of higher-order convergence if g’(p) = 0. 

Because g” is continuous and strictly bounded by M on the interval [p — 5, p + 6], this 
also implies that, for sufficiently large values of n, 


M 2 
[Putt — Pl < Z| Pn — Pl =» 8 @ 


Theorems 2.8 and 2.9 tell us that our search for quadratically convergent fixed-point 
methods should point in the direction of functions whose derivatives are zero at the fixed 
point. That is: 


e For a fixed point method to converge quadratically we need to have both g(p) = p, and 
g'(p) = 0. 
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Definition 2.10 


For polynomials, p is a zero 
of multiplicity m of f if 
f() = & — p)"q(x), where 
q(p) #0. 


Theorem 2.11 


Solutions of Equations in One Variable 


The easiest way to construct a fixed-point problem associated with a root-finding prob- 
lem f(x) = 0 is to add or subtract a multiple of f(x) from x. Consider the sequence 


Pn = 8(Pr-1), forn>= 1, 


for g in the form 


gi) =x— OQ) fQ), 


where ¢ is a differentiable function that will be chosen later. 
For the iterative procedure derived from g to be quadratically convergent, we need to 
have g'(p) = 0 when f(p) = 0. Because 


g(x) =1-¢'@)f@ — f'@)d@), 
and f(p) = 0, we have 


g(p) =1—9'(p) f (p) — f'(P)o(p) = 1— $'(p) -0— f'(p)o(p) = 1 — f'(P)o(P), 


and g’(p) = O if and only if d(p) = 1/f'(p). 
If we let P(x) = 1/f’(x), then we will ensure that 6(p) = 1/f’(p) and produce the 
quadratically convergent procedure 


FS (Pn-) 


Pn = 8(Pn-1) = Pn-1 - : 
FS! (Pn-1) 


This, of course, is simply Newton’s method. Hence 


e If f(p) = Oand f’(p) # 0, then for starting values sufficiently close to p, Newton’s 
method will converge at least quadratically. 


Multiple Roots 


In the preceding discussion, the restriction was made that f’(p) 4 0, where p is the solution 
to f(x) = 0. In particular, Newton’s method and the Secant method will generally give 
problems if f’(p) = 0 when f(p) = 0. To examine these difficulties in more detail, we 
make the following definition. 


A solution p of f(x) = 0 is a zero of multiplicity m of f if for x # p, we can write 
f(x) = (« — p)’"q(), where lim,_,, g(x) £ 0. | 


In essence, g(x) represents that portion of f(x) that does not contribute to the zero of 
f. The following result gives a means to easily identify simple zeros of a function, those 
that have multiplicity one. 


The function f € C![a,b] has a simple zero at p in (a,b) if and only if f(p) = 0, but 
f'(p) £0. B 


Proof If f has a simple zero at p, then f(p) = O and f(x) = (x — p)q(x), where 
lim, q(x) 4 0. Since f € C'[a, b], 


f'(p) = lim f'(@) = lim[q(x) + @ ~ p)q'@)] = lim q@) 0. 


Conversely, if f(p) = 0, but f’(p) 4 0, expand f in a zeroth Taylor polynomial about p. 
Then 


SO) = f(p) + fF EQ)@ — p) = & — p) f'(E@), 
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Example 1 

Table 2.8 
n Pn 

0 1.0 

1 0.58198 

2 0.31906 

3 0.16800 

4 0.08635 

5 0.04380 

6 0.02206 

7 0.01107 

8 0.005545 

9 2.7750 x 1073 
10 1.3881 x 107-3 
11 6.9411 x 10-4 
12 3.4703 x 10-+ 
13 1.7416 x 107+ 
14 8.8041 x 1075 
15 4.2610 x 10-> 
16 1.9142 x 10~° 

Figure 2.12 
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where &(x) is between x and p. Since f € C'[a, b], 
lim f'(E()) = f'(lim &@)) = "(p) #0. 
xp xp 


Letting g = f’ 0€ gives f(x) = (x — p)q(x), where lim,_,, g(x) 4 0. Thus f has a simple 
zero at p. =. 8 


The following generalization of Theorem 2.11 is considered in Exercise 12. 


The function f € C”[a, b] has a zero of multiplicity m at p in (a, b) if and only if 
0=fiP=f(D=f" P= =f (Pp), but f(y) £0. u 


The result in Theorem 2.12 implies that an interval about p exists where Newton’s 
method converges quadratically to p for any initial approximation po = p, provided that p 
is a simple zero. The following example shows that quadratic convergence might not occur 
if the zero is not simple. 


Let f(x) = e* — x — 1. (a) Show that f has a zero of multiplicity 2 at x = 0. (b) Show that 

Newton’s method with po = | converges to this zero but not quadratically. 

Solution (a) We have 
f@=e-x-1, 


fi@=e-1 and f"(x) =e", 


so 
f@) =e —-0-1=0, f'(0) =e -1=0 and f’(0)=e°=1. 
Theorem 2.12 implies that f has a zero of multiplicity 2 at x = 0. 


(b) The first two terms generated by Newton’s method applied to f with po = 1 are 


= 
ip i ~ 0.58198, 
f'(Po) e- 
and 
f(py) 0.20760 
=p ~ 0.58198 — ~ 0.31906. 
Pa PL Fp) 0.78957 


The first sixteen terms of the sequence generated by Newton’s method are shown in Table 
2.8. The sequence is clearly converging to 0, but not quadratically. The graph of f is shown 
in Figure 2.12. a 
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Example 2 

Table 2.9 

n Pn 

1 —2.3421061 x 107! 
2 —8.4582788 x 1073 
3 —1.1889524 x 1075 
4 —6.8638230 x 10~° 
5 —2.8085217 x 1077 


Illustration 


Solutions of Equations in One Variable 


One method of handling the problem of multiple roots of a function f is to define 


If p is a zero of f of multiplicity m with f(x) = & — p)'"q(x), then 


= (x — p)"q(x) 
mx — py") a) + @& — py" aq’) 


ee, qx) 
mq(x) + (« — p)q'(x) 


[L(x) 


also has a zero at p. However, g(p) 4 0, so 


q(p) _ 1 
mq(p)+(p—p)q'(p) mm 


#0, 


and p is a simple zero of j4(x). Newton’s method can then be applied to jz(x) to give 


we) _ F@)/F'@) 
WO {LP@OP = LFF’ OVAL OOP 


g(x) =x 


which simplifies to 


f) FO) 
[LAP/OP — FOF") 


If g has the required continuity conditions, functional iteration applied to g will be 
quadratically convergent regardless of the multiplicity of the zero of f. Theoretically, the 
only drawback to this method is the additional calculation of f” (x) and the more laborious 
procedure of calculating the iterates. In practice, however, multiple roots can cause serious 
round-off problems because the denominator of (2.13) consists of the difference of two 
numbers that are both close to 0. 


g(x) =x (2.13) 


In Example | it was shown that f(x) = e* — x — 1 has a zero of multiplicity 2 at x = 0 and 
that Newton’s method with py = | converges to this zero but not quadratically. Show that the 
modification of Newton’s method as given in Eq. (2.13) improves the rate of convergence. 


Solution Modified Newton’s method gives 


Ff (Po) f'(Po) 1 (e—2)(e— 1) 
Ff’ (po)? = F(po).f"(Po) e-ly—e>2e 


This is considerably closer to O than the first term using Newton’s method, which was 
0.58918. Table 2.9 lists the first five approximations to the double zero at x = 0. The results 
were obtained using a system with ten digits of precision. The relative lack of improvement 
in the last two entries is due to the fact that using this system both the numerator and the 
denominator approach 0. Consequently there is a loss of significant digits of accuracy as 
the approximations approach 0. a 


x —2.3421061 x 107!. 


P1 = Po 


The following illustrates that the modified Newton’s method converges quadratically 
even when in the case of a simple zero. 


In Section 2.2 we found that a zero of f(x) = x* + 4x7 — 10 = 0 is p = 1.36523001. 
Here we will compare convergence for a simple zero using both Newton’s method and the 
modified Newton’s method listed in Eq. (2.13). Let 
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Pe + 4pr_) — 10 
ae + 8Pn—-1 


(i) Pn = Pn-1 , from Newton’s method 


and, from the Modified Newton’s method given by Eq. (2.13), 


car al Ap, = 10)(3p2_, + 8pn—1) 
(3p5_, + 8pn—1)? — (p3_, + 4p3_, — 10)(6pn—1 + 8) 


(ii) Pn = Pn-1 


With po = 1.5, we have 


Newton’s method 

Pi = 1.37333333, po = 1.36526201, and p3 = 1.36523001. 
Modified Newton’s method 

Pi = 1.35689898, pz = 1.36519585, and ps3 = 1.36523001. 


Both methods are rapidly convergent to the actual zero, which is given by both methods as 
p3. Note, however, that in the case of a simple zero the original Newton’s method requires 
substantially less computation. 


Maple contains Modified Newton’s method as described in Eq. (2.13) in its Numerical- 
Analysis package. The options for this command are the same as those for the Bisection 
method. To obtain results similar to those in Table 2.9 we can use 


with(Student([NumericalAnalysis]) 
fr=e-x-1 
ModifiedNewton Gee = 1.0, tolerance = 10~'°, output = sequence, maxiterations = 20) 


Remember that there is sensitivity to round-off error in these calculations, so you might 
need to reset Digits in Maple to get the exact values in Table 2.9. 


EXERCISE SET 24 


1. Use Newton’s method to find solutions accurate to within 10~> to the following problems. 
a x—2xe*+e*%=0, forO<x<1 
b.  cos(x + V2) +.x(x/2+ V2) =0, for—2<x<-1 
ce. x? — 3x°(2) + 3x(4%) -87% =0, forO<x<1 
d.  e* +3(In2)?e* — (In8)e* — (In2)?=0, for-1<x <0 

2. Use Newton’s method to find solutions accurate to within 10-> to the following problems. 
a. 1—4xcosx+2x?+cos2x=0, for0O<x<1 
b,x? + 6x° + 9x4 — 2x3 — 6x7 +1=0, for—3<x< -2 
ce. sin3x + 3e-* sinx — 3e™ sin2x —e-** = 0, for3<x<4 
d. e* — 27x° + 27xte* — 9x7e* =0, for3<x<5 

3. Repeat Exercise 1 using the modified Newton’s method described in Eq. (2.13). Is there an improve- 
ment in speed or accuracy over Exercise 1? 
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4. Repeat Exercise 2 using the modified Newton’s method described in Eq. (2.13). Is there an improve- 
ment in speed or accuracy over Exercise 2? 


5. Use Newton’s method and the modified Newton’s method described in Eq. (2.13) to find a solution 
accurate to within 10~> to the problem 


e™ + 1.441e?* — 2.079e* — 0.3330 =0, for —1<x<0. 


This is the same problem as 1(d) with the coefficients replaced by their four-digit approximations. 
Compare the solutions to the results in 1(d) and 2(d). 


6. Show that the following sequences converge linearly to p = 0. How large must n be before |p, — p| < 


5 x 1077? 
1 1 
a Pr=-, n=l bo Pxr=a nal 
n n 
7. a. Show that for any positive integer k, the sequence defined by p, = 1/n* converges linearly to 
p=0. 


b. For each pair of integers k and m, determine a number N for which 1/N* < 107”. 
8. a. Show that the sequence p, = 10-2" converges quadratically to 0. 


Show that the sequence p, = 10-" does not converge to 0 quadratically, regardless of the size 
of the exponent k > 1. 


9. a. Construct a sequence that converges to 0 of order 3. 
b. Suppose a > 1. Construct a sequence that converges to 0 zero of order a. 


10. Suppose p is a zero of multiplicity m of f, where f” is continuous on an open interval containing 
p. Show that the following fixed-point method has g’(p) = 0: 


mf (x) 
g(x) =x- ; : 
f'() 
11. Show that the Bisection Algorithm 2.1 gives a sequence with an error bound that converges linearly 


to 0. 


12. Suppose that f has m continuous derivatives. Modify the proof of Theorem 2.11 to show that f has 
a zero of multiplicity m at p if and only if 


0= f(p=f'(pP)=---= f(y), dut f(y) £0. 


13. The iterative method to solve f(x) = 0, given by the fixed-point method g(x) = x, where 


St (Pr-v) fF" (Pn) St (Pn-v) 
Pa = 8(Pn-1) = Pn-1 


Ff CPn-1) 2 fn) L Fn) 


has g'(p) = g’(p) = 0. This will generally yield cubic (a = 3) convergence. Expand the analysis of 
Example 1 to compare quadratic and cubic convergence. 


2 
, for n=1,2,3,..., 


14. It can be shown (see, for example, [DaB], pp. 228-229) that if {p,}°2) are convergent Secant 
method approximations to p, the solution to f(x) = 0, then a constant C exists with |pn41 — p| © 
C |Pn — P| |\Pn—1 — p| for sufficiently large values of n. Assume { p,} converges to p of order a, and 
show that a = (1 + 5)/2. (Note: This implies that the order of convergence of the Secant method 
is approximately 1.62). 


| a 2.5 Accelerating Convergence 


Theorem 2.8 indicates that it is rare to have the luxury of quadratic convergence. We now 
consider a technique called Aitken’s A? method that can be used to accelerate the conver- 
gence of a sequence that is linearly convergent, regardless of its origin or application. 
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Alexander Aitken (1895-1967) 
used this technique in 1926 to 
accelerate the rate of convergence 
of a series in a paper on algebraic 
equations [Ai]. This process is 
similar to one used much earlier 
by the Japanese mathematician 
Takakazu Seki Kowa 


(1642-1708). 

Table 2.10 

n Pn Pn 

1 0.54030 0.96178 
2 0.87758 0.98213 
3. 0.94496 0.98979 
4 0.96891 0.99342 
5 0.98007 0.99541 
6 0.98614 

7 0.98981 


Example 1 
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Aitken’s A? Method 


Suppose { p,,}°2. is a linearly convergent sequence with limit p. To motivate the construction 
of a sequence {p,,}°°,, that converges more rapidly to p than does { p,}°2o, let us first assume 
that the signs of py — p, Pn41 — p, and py+2 — p agree and that n is sufficiently large that 


Pn+1 — P ey Pn+2 — P 
Pn —P Pn+1 — P 


Then 
(Putt — PY © (Pn+2 — P)(Pn — P)s 
sO 
Post — 2Pn4i1P +P” © Pn42Pn — (Pn + Pna2)P + P” 
and 


(Pnt2 + Pn — 2Pn4i)P © Pn+2Pn — ee 


Solving for p gives 


Pn+2Pn — Pei 
Pn+2 — 2Pn+1 + Pn 


Adding and subtracting the terms Be and 2p,p,+1 in the numerator and grouping terms 
appropriately gives 


pr PaPn+2 — 2PnPnt+i + fo = Pe + 2PnPn+1 = pr 
Pn+2 — 2Pn+1 + Pn 


_ Pn(Pn+2 _ 2Pn+1 + Pn) _ Ges aa 2PnPn+1 + pr) 
Pnt+2 — 2Pnti + Pn 


(Pn+1 — Pn) 
Pn42 — 2Pn41 + Pn 


= Pn 
Aitken’s A” method is based on the assumption that the sequence { p,}®), defined by 


(Pn+1 —Pn)* 
Pn+2 — 2Pn+i + Pn , 


Pn = Pn (2.14) 


converges more rapidly to p than does the original sequence { pp}. 


The sequence { p,}°°,, where p, = cos(1/n), converges linearly to p = 1. Determine the 
first five terms of the sequence given by Aitken’s A? method. 


Solution In order to determine a term Pp, of the Aitken’s A* method sequence we need to 
have the terms p,, Pn+1, and py+2 of the original sequence. So to determine ps we need 
the first 7 terms of { p,}. These are given in Table 2.10. It certainly appears that {p,,}°° , 


converges more rapidly to p = 1 than does { py}P°,. a 


The A notation associated with this technique has its origin in the following definition. 
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Johan Frederik Steffensen 
(1873-1961) wrote an influential 
book entitled Interpolation in 


1927. 


CHAPTER 2 # Solutions of Equations in One Variable 


Definition 2.13 Fora given sequence { p,}?°, the forward difference Ap, (read “delta p,”) is defined by 
APn = Pn+1 — Pn: forn = 0. 
Higher powers of the operator A are defined recursively by 


Ap, = A(A*'p,), fork = 2. ~ 


The definition implies that 


Mp, — A(Pat+1 — Pn) _ APn+1 = APn = (Pn+2 Pn+i) (Pati Pn)- 


So Ap, = Pn42 — 2Pn41 + Pn, and the formula for p, given in Eq. (2.14) can be written as 


2 
Pn =Pn- forn > 0. (2.15) 
P. 


To this point in our discussion of Aitken’s A? method, we have stated that the sequence 
{Pn}°2.9, converges to p more rapidly than does the original sequence { p,}°°, but we have 
not said what is meant by the term “more rapid” convergence. Theorem 2.14 explains and 
justifies this terminology. The proof of this theorem is considered in Exercise 16. 


Theorem 2.14 Suppose that { p,}°2. is a sequence that converges linearly to the limit p and that 


lim Pet! P 
nO Dy — Pp 


<i. 


Then the Aitken’s A? sequence {p, ng converges to p faster than { p,}P°, in the sense that 


e Pn a Pp 
lim 
n> Co Pn _ DP 


= 0. | 


Steffensen’s Method 


By applying a modification of Aitken’s A? method to a linearly convergent sequence ob- 
tained from fixed-point iteration, we can accelerate the convergence to quadratic. This 
procedure is known as Steffensen’s method and differs slightly from applying Aitken’s 
A? method directly to the linearly convergent fixed-point iteration sequence. Aitken’s A? 
method constructs the terms in order: 


Po. Pi=8(po), p2=8(Pi), Po = {A*}(po), 
P3 = 8(P2), Pi = {A*}(pi),-- 


where {A?} indicates that Eq. (2.15) is used. Steffensen’s method constructs the same 
first four terms, po, P1, P2, and po. However, at this step we assume that po is a better 
approximation to p than is p2 and apply fixed-point iteration to po instead of p2. Using this 
notation, the sequence is 
0) 0) 0) 0 0 1 0 1 1 
Pas PE Seley le Be Hey We BS HI. PO Se es 

Every third term of the Steffensen sequence is generated by Eq. (2.15); the others use 
fixed-point iteration on the previous term. The process is described in Algorithm 2.6. 
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Steffensen’s 


To find a solution to p = g(p) given an initial approximation po: 


INPUT _ initial approximation po; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 
Step 7 Seti=1. 
Step 2. While i < No do Steps 3-6. 
Step 3 Set p; = g(po); (Compute p”.) 
po = g(pi);. (Compute ps.) 
P = Po — (Pi — Po)? /(p2 — 2p1 + Po). (Compute p\’.) 
Step 4 If|p—po| < TOL then 
OUTPUT (p); (Procedure completed successfully.) 
STOP. 
Step5 Seti=i+l. 
Step 6 Setpo =p. (Update po.) 
Step 7 OUTPUT (‘Method failed after No iterations, No =’, No); 


(Procedure completed unsuccessfully.) 
STOP. a 


Note that A2p,, might be 0, which would introduce a 0 in the denominator of the next 
iterate. If this occurs, we terminate the sequence and select a ” as the best approximation. 


Illustration To solve x* + 4x* — 10 = O using Steffensen’s method, let x? + 4x” = 10, divide by x + 4, 
and solve for x. This procedure produces the fixed-point method 


10 \'? 
«= (2) : 


We considered this fixed-point method in Table 2.2 column (d) of Section 2.2. 


Applying Steffensen’s procedure with po = 1.5 gives the values in Table 2.11. The iterate 
a = 1.365230013 is accurate to the ninth decimal place. In this example, Steffensen’s 
method gave about the same accuracy as Newton’s method applied to this polynomial. 
These results can be seen in the Illustration at the end of Section 2.4. 


Table 2.11 ; p® p® p® 
0 1.5 1.348399725 1.367376372 
1 1.365265224 1.365225534 1.365230583 
2 1.365230013 


From the Illustration, it appears that Steffensen’s method gives quadratic convergence 
without evaluating a derivative, and Theorem 2.14 states that this is the case. The proof of 
this theorem can be found in [He2], pp. 90-92, or [IK], pp. 103-107. 
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Theorem 2.15 Suppose that x = g(x) has the solution p with g’(p) 4 1. If there exists a 6 > O such 
that g € C*[p — 5,p + 6], then Steffensen’s method gives quadratic convergence for any 


Po € [p—6,p + 4]. a 


Steffensen’s method can be implemented in Maple with the NumericalAnalysis pack- 
age. For example, after entering the function 


10 
x+4 
the Maple command 


gs 


Steffensen( fixedpointiterator = g,x = 1.5,tolerance = 10-8, output = information, 
maxiterations = 20) 


produces the results in Table 2.11, as well as an indication that the final approximation has 
a relative error of approximately 7.32 x 107!°. 


EXERCISE SET 25 


1. The following sequences are linearly convergent. Generate the first five terms of the sequence {p,} 
using Aitken’s A? method. 
a. po=0.5, pn =(2—e1+4+p?_,)/3, n>1 
b. po = 0.75, pn = (e?“!/3)'7,, n> 1 
ce po =905, prp=3 Prt, n=l 
d. pp =0.5, py =COSPnr-1, n> 

2. Consider the function f(x) = e +3(In 2)?e”* — (In 8)e** — (In 2)°. Use Newton’s method with Po =0 
to approximate a zero of f. Generate terms until | p,41 — Pn| < 0.0002. Construct the sequence {p,,}. 
Is the convergence improved? 


3. Let g(x) = cos(x — 1) and By = 2. Use Steffensen’s method to find p. 


Let g(x) = 1 + (sin x)? and oe = 1. Use Steffensen’s method to find a and ie : 


5.  Steffensen’s method is applied to a function g(x) using po = 1 and p> = 3 to obtain py = 0.75. 


What is p\’? 
6. Steffensen’s method is applied to a function g(x) using py = land a = J/2to obtain > = 2.7802. 
What is pS”? 


7. Use Steffensen’s method to find, to an accuracy of 10+, the root of x7 — x — 1 = 0 that lies in [1,2], 
nd compare this to the results of Exercise 6 of Section 2.2. 

8. Use Steffensen’s method to find, to an accuracy of 10~*, the root of x — 2~* = 0 that lies in [0, 1], 
nd compare this to the results of Exercise 8 of Section 2.2. 

9. Use Steffensen’s method with py = 2 to compute an approximation to V3 accurate to within 10~‘. 
Compare this result with those obtained in Exercise 9 of Section 2.2 and Exercise 12 of Section 2.1. 
10. Use Steffensen’s method with py = 3 to compute an approximation to 25 accurate to within 10~*. 
Compare this result with those obtained in Exercise 10 of Section 2.2 and Exercise 13 of Section 2.1. 


11. se Steffensen’s method to approximate the solutions of the following equations to within 10~>. 


U 

a. x = (2—e* 4+ x)/3, where g is the function in Exercise 11(a) of Section 2.2. 

b. x = 0.5(sinx + cos x), where g is the function in Exercise 11(f) of Section 2.2. 

c. x = (e*/3)!'/?, where g is the function in Exercise 11(c) of Section 2.2. 

d. x =5*, where g is the function in Exercise 11(d) of Section 2.2. 

12. Use Steffensen’s method to approximate the solutions of the following equations to within 10~°. 
a. 2+ sinx —x = 0, where g is the function in Exercise 12(a) of Section 2.2. 


b. x° — 2x —5 = 0, where g is the function in Exercise 12(b) of Section 2.2. 
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c. 3x? — e* = 0, where g is the function in Exercise 12(c) of Section 2.2. 
d. x—cosx = 0, where g is the function in Exercise 12(d) of Section 2.2. 


13. The following sequences converge to 0. Use Aitken’s A? method to generate {p,} until |p,| < 5x 1077: 


1 1 
a. Pr= a n= 1 b. Pr= i se 


14. A sequence { p,} is said to be superlinearly convergent to p if 


lim | Put — P| =0 


noo | Dn — P| 
a. Show that if p, — p of order w fora > 1, then { p,} is superlinearly convergent to p. 


b. Show that p, = a is superlinearly convergent to 0 but does not converge to 0 of order a for any 
a>l. 


15. Suppose that { p,,} is superlinearly convergent to p. Show that 


lim | Pot — Pal =74 
Loree | Pn — Pl 


16. Prove Theorem 2.14. [Hint: Let 5, = (Pns1 — P)/(Pn — P) — 4, and show that lim,_,.. 6, = 0. Then 
express (Dn+1 — P)/(Pn — p) in terms of 5y, dn41, and A.] 


17. Let P,,(x) be the nth Taylor polynomial for f(x) = e* expanded about x = 0. 
a. For fixed x, show that p, = P,,(x) satisfies the hypotheses of Theorem 2.14. 
b. Letx = 1, and use Aitken’s A? method to generate the sequence po,..., Ps. 


c. Does Aitken’s method accelerate convergence in this situation? 


| aS 2.6 Zeros of Polynomials and Miller's Method 


A polynomial of degree n has the form 
P(x) = Gyx" + dn_x | +--+ ayx +ao, 
where the a;’s, called the coefficients of P, are constants and a, ~ 0. The zero function, 


P(x) = 0 for all values of x, is considered a polynomial but is assigned no degree. 


Algebraic Polynomials 


Theorem 2.16 (Fundamental Theorem of Algebra) 


If P(x) is a polynomial of degree n > 1 with real or complex coefficients, then P(x) = 0 
has at least one ( possibly complex) root. a 


Although the Fundamental Theorem of Algebra is basic to any study of elementary 
functions, the usual proof requires techniques from the study of complex function theory. 
The reader is referred to [SaS], p. 155, for the culmination of a systematic development of 
the topics needed to prove the Theorem. 


Example 1 Determine all the zeros of the polynomial P(x) = x° — 5x” + 17x — 13. 


Solution It is easily verified that P(1) = 1 -—5+ 17-13 =0.sox = 1 is a zero of P and 
(x — 1) is a factor of the polynomial. Dividing P(x) by x — | gives 


P(x) = (x — 1)? — 4x + 13). 
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Carl Friedrich Gauss 
(1777-1855), one of the greatest 
mathematicians of all time, 
proved the Fundamental Theorem 
of Algebra in his doctoral 
dissertation and published it in 
1799. He published different 
proofs of this result throughout 
his lifetime, in 1815, 1816, and as 
late as 1848. The result had been 
stated, without proof, by Albert 
Girard (1595-1632), and partial 
proofs had been given by Jean 

d’ Alembert (1717-1783), Euler, 
and Lagrange. 


Corollary 2.17 


Corollary 2.18 


William Horner (1786-1837) was 
a child prodigy who became 
headmaster of a school in Bristol 
at age 18. Horner’s method for 
solving algebraic equations 

was published in 1819 in the 
Philosophical Transactions of the 
Royal Society. 


Theorem 2.19 


Solutions of Equations in One Variable 


To determine the zeros of x — 4x + 13 we use the quadratic formula in its standard form, 
which gives the complex zeros 


—(—4) + /(-4)? — 4() 13) | 44 V—36 


2(1) D 


=2+3i. 


Hence the third-degree polynomial P(x) has three zeros, x} = 1, x2 = 2 — 3i, and 
y= 2 + 3i. |_| 


In the preceding example we found that the third-degree polynomial had three distinct 
zeros. An important consequence of the Fundamental Theorem of Algebra is the following 
corollary. It states that this is always the case, provided that when the zeros are not distinct 
we count the number of zeros according to their multiplicities. 


If P(x) is a polynomial of degree n > 1 with real or complex coefficients, then there exist 
unique constants x1, x2, ..., Xz, possibly complex, and unique positive integers m1, 12, ..., 
mx, such that ea m; = n and 


P(X) = n(x — x1)" & — 22)" ++ = HK). a 


By Corollary 2.17 the collection of zeros of a polynomial is unique and, if each zero 
x; is counted as many times as its multiplicity m;, a polynomial of degree n has exactly n 
Zeros. 

The following corollary of the Fundamental Theorem of Algebra is used often in this 
section and in later chapters. 


Let P(x) and Q(x) be polynomials of degree at most n. If x;, x2, ..., x,, with k > n, are 
distinct numbers with P(x;) = Q(x;) fori = 1,2,...,k, then P(x) = Q(x) for all values 
of x. a 


This result implies that to show that two polynomials of degree less than or equal to n 
are the same, we only need to show that they agree at n + | values. This will be frequently 
used, particularly in Chapters 3 and 8. 


Horner’s Method 


To use Newton’s method to locate approximate zeros of a polynomial P(x), we need to 
evaluate P(x) and P’(x) at specified values. Since P(x) and P’(x) are both polynomials, 
computational efficiency requires that the evaluation of these functions be done in the nested 
manner discussed in Section 1.2. Horner’s method incorporates this nesting technique, and, 
as a consequence, requires only n multiplications and n additions to evaluate an arbitrary 
nth-degree polynomial. 


(Horner's Method) 
Let 

P(X) = px" + ayn"! +++ tax +a. 
Define b, = a, and 
fork =n—1,n—2,... 


by = ap + bes1%X0, , 1,0. 
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Paolo Ruffini (1765-1822) had 
described a similar method which 
won him the gold medal from the 
Italian Mathematical Society for 
Science. Neither Ruffini nor 
Horner was the first to discover 
this method; it was known in 
China at least 500 years earlier. 


Example 2 


The word synthetic has its roots 
in various languages. In standard 
English it generally provides the 
sense of something that is “false” 
or “substituted”. But in 
mathematics it takes the form of 
something that is “grouped 
together’. Synthetic geometry 
treats shapes as whole, rather 
than as individual objects, which 
is the style in analytic geometry. 
In synthetic division of 
polynomials, the various powers 
of the variables are not explicitly 
given but kept grouped together. 
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Then bp = P(xo). Moreover, if 
Q(X) = Byx™* + Dyan”? + +++ + box + bi, 


then 


P(x) = (« — x9) Q(x) + bo. a 


Proof By the definition of Q(x), 
(x — x0) Q(x) + bo = (x — x0) (bnx" | + +++ + box + bi) + bo 
= (Bpx" + by ax" | + +++ + box? + dix) 
— (bnxox" | + +++ + byxox + b1x9) + bo 
= dy Xx" + (Bp—1 — bnXo)x" | + +++ + (by — bpx0)x + (bo — b1%0). 
By the hypothesis, b, = a, and by — bg41xX9 = ag, SO 


(x — x9) O(x) + bp = P(x) and bo = P(x). sn 8 


Use Horner’s method to evaluate P(x) = 2x* — 3x” + 3x — 4 at xy = —2. 


Solution When we use hand calculation in Horner’s method, we first construct a table, 
which suggests the synthetic division name that is often applied to the technique. For this 
problem, the table appears as follows: 


Coefficient Coefficient Coefficient Coefficient Constant 
of x* of x? of x? of x term 
Xo = —2 a4=2 a, =0 a) = -3 a,=3 dy = —4 
b4Xxo =-—4 b3Xo = 8 boxy =-10 bx = 14 
b4 = 2 b3 = —4 by =5 bj =—-7 bo = 10 
So, 
PG) = et 2) — 4 + Sx — 7) + 10. a 


An additional advantage of using the Horner (or synthetic-division) procedure is that, 
since 


P(x) = (& — X0)O() + bo, 
where 
O(x) = bax”! + Dy_x? +--+ + box + bi, 
differentiating with respect to x gives 
P’(x) = Q(x) + (& — x9) O'(x) 


and P’(xo) = O(xo). (2.16) 


When the Newton-Raphson method is being used to find an approximate zero of a polyno- 
mial, P(x) and P’(x) can be evaluated in the same manner. 
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Example 3 Find an approximation to a zero of 
P(x) = 2x* — 3x? + 3x —4, 


using Newton’s method with x9 = —2 and synthetic division to evaluate P(x,) and P’ (x,) 
for each iterate x,. 


Solution With x9 = —2 as an initial approximation, we obtained P(—2) in Example 1 by 
Xp = —2 2 0 —3 3 —4 
—4 8 —10 14 
2 -4 5 -7 10 =P(—2). 


Using Theorem 2.19 and Eq. (2.16), 
O(x) = 2x7 — 4x7 +5x-—7 and P’(—2) = O(-2), 


so P’(—2) can be found by evaluating Q(—2) in a similar manner: 


xo = -2 2 —4 5 —7 
—4 16 —42 
2 -8 21 -49 =Q(-2) = P'(-2) 
and 
P P 10 
se eae as ~ —1.796. 
P’(x0) Q(X) =ay 
Repeating the procedure to find x2 gives 
—1.796 2 0 —3 3 —4 
—3.592 6.451 —6.197 5.742 
2 —3.592 3.451 —3.197 1.742 = P(x) 
—3,592 12.902 —29.368 
2 —7.184 16.353 —32.565 = O(%) = P'(x1). 


So P(—1.796) = 1.742, P’(—1.796) = Q(—1.796) = —32.565, and 


1.796 pila, x —1.7425 
ae —32.565 
In a similar manner, x3 = —1.73897, and an actual zero to five decimal places is — 1.73896. 
Note that the polynomial Q(x) depends on the approximation being used and changes 
from iterate to iterate. | 


Algorithm 2.7 computes P(xo) and P’(xo) using Horner’s method. 
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Horner's 
To evaluate the polynomial 
P(X) = Gyx” + Gy_1x™ | +--+ + ax + a9 = (% — x0) Q(X) + bo 


and its derivative at xo: 


INPUT degree n; coefficients ap, a),...,4n3X0- 
OUTPUT y= P(%)3z = P’(%). 


Step 1 Sety=ay,; (Compute b, for P.) 
Z=ad,. (Compute b,_; for Q.) 
Step 2 Forj=n—1,n—2,...,1 
set y= xoy +a;; (Compute b, for P.) 
Z=xXoz+y. (Compute bj- for Q.) 
Step 3 Sety=xoy+ao. (Compute bo for P.) 


Step 4 OUTPUT (y, z); 
STOP. | 


If the Nth iterate, xy, in Newton’s method is an approximate zero for P, then 
P(x) = (« — xy) Q(x) + bo = (% — Xn) Q(X) + Pw) © (& — xy) Q(X), 


sO x — xy iS an approximate factor of P(x). Letting x, = xy be the approximate zero of P 
and Q; (x) = Q(x) be the approximate factor gives 


P(x) © (x — X1)Qi (x). 


We can find a second approximate zero of P by applying Newton’s method to Q; (x). 

If P(x) is an nth-degree polynomial with n real zeros, this procedure applied repeatedly 
will eventually result in (7 — 2) approximate zeros of P and an approximate quadratic factor 
Q,-2(x). At this stage, Q,-2(x) = 0 can be solved by the quadratic formula to find the last 
two approximate zeros of P. Although this method can be used to find all the approximate 
zeros, it depends on repeated use of approximations and can lead to inaccurate results. 

The procedure just described is called deflation. The accuracy difficulty with deflation 
is due to the fact that, when we obtain the approximate zeros of P(x), Newton’s method is 
used on the reduced polynomial Q, (x), that is, the polynomial having the property that 


P(x) © (x — X1)(@ — X2) ++ @ — Xi) Oe (X). 


An approximate zero X41 of Q, will generally not approximate a root of P(x) = 0 as well 
as it does a root of the reduced equation Q; (x) = 0, and inaccuracy increases as k increases. 
One way to eliminate this difficulty is to use the reduced equations to find approximations x2, 
X3,...,4, to the zeros of P, and then improve these approximations by applying Newton’s 
method to the original polynomial P(x). 


Complex Zeros: Miuller’s Method 


One problem with applying the Secant, False Position, or Newton’s method to polynomials 
is the possibility of the polynomial having complex roots even when all the coefficients are 
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Theorem 2.20 


Miiller’s method is similar to the 
Secant method. But whereas the 
Secant method uses a line 
through two points on the curve 
to approximate the root, Miiller’s 
method uses a parabola through 
three points on the curve for the 
approximation. 


Figure 2.13 


Solutions of Equations in One Variable 


real numbers. If the initial approximation is a real number, all subsequent approximations 
will also be real numbers. One way to overcome this difficulty is to begin with a complex 
initial approximation and do all the computations using complex arithmetic. An alternative 
approach has its basis in the following theorem. 


Ifz = a+biisacomplex zero of multiplicity m of the polynomial P(x) with real coefficients, 
then Z = a — bi is also a zero of multiplicity m of the polynomial P(x), and (x? — 2ax + 
a’ + b*)” is a factor of P(x). | 


A synthetic division involving quadratic polynomials can be devised to approximately 
factor the polynomial so that one term will be a quadratic polynomial whose complex roots 
are approximations to the roots of the original polynomial. This technique was described 
in some detail in our second edition [BFR]. Instead of proceeding along these lines, we 
will now consider a method first presented by D. E. Miiller [Mu]. This technique can be 
used for any root-finding problem, but it is particularly useful for approximating the roots 
of polynomials. 

The Secant method begins with two initial approximations po and p,; and determines 
the next approximation p> as the intersection of the x-axis with the line through (po, f (po)) 
and (p1, f(p1)). See Figure 2.13(a).) Miiller’s method uses three initial approximations, 
Po.P1, and p2, and determines the next approximation p3 by considering the intersection 
of the x-axis with the parabola through (po, f(po)), (pi, f(p1)), and (p2, f (p2)). (See 
Figure 2.13(b).) 


The derivation of Miiller’s method begins by considering the quadratic polynomial 
P(x) = a(x — pr)’ + B@& — pr) +. 


that passes through (po, f(po)), (Pi, f(p1)), and (p2, f(p2)). The constants a, b, and c 
can be determined from the conditions 


Ff (po) = a(po — p2)” + b(po = pa) +, (2.17) 

f (pi) = a(p1 — pr)” + (pi — Pn) +, (2.18) 
and 

f(p2)=a-0+b-0+c=c (2.19) 
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to be 
c= f(p2), (2.20) 
» — (Po — po) Lf (pr) — f (p2)1 — (pi — p2 LF (po) — f (P21 (2.21) 
(po — p2)(P1 — P2)(Po — P1) , 
and 
ee (pi — Pal f (Po) — f(p2)] — (Po — pa (Py) — fp) (2.22) 


(Po — p2)(P1 — p2)(Po — P1) 


To determine p3, a zero of P, we apply the quadratic formula to P(x) = 0. However, because 
of round-off error problems caused by the subtraction of nearly equal numbers, we apply 
the formula in the manner prescribed in Eq (1.2) and (1.3) of Section 1.2: 


—2c 
3 — p22 = ——————.. 
ae b+ Jb? — 4ac 


This formula gives two possibilities for p3, depending on the sign preceding the radical term. 
In Miiller’s method, the sign is chosen to agree with the sign of b. Chosen in this manner, 
the denominator will be the largest in magnitude and will result in p3 being selected as the 
closest zero of P to pz. Thus 


2c 
pO? hte ee = wae 


where a, b, and c are given in Eqs. (2.20) through (2.22). 

Once p3 is determined, the procedure is reinitialized using p1, p2, and p3 in place of po, 
P, and p> to determine the next approximation, p4. The method continues until a satisfactory 
conclusion is obtained. At each step, the method involves the radical Vb? — 4ac, so the 
method gives approximate complex roots when b* — 4ac < 0. Algorithm 2.8 implements 
this procedure. 


Muller’s 


To find a solution to f(x) = 0 given three approximations, po, p;, and p>: 


INPUT po, p1,p2; tolerance TOL; maximum number of iterations No. 
OUTPUT approximate solution p or message of failure. 


Step 1 Seth, =p; — po; 
hz = p2— pi; 
5) = (f(py — f(po))/m; 
bo = (f(p2) — f(pi))/ha; 
d = (62 — 81)/(i2 + hy); 
i=3. 

Step 2. While i < No do Steps 3-7. 


Step3 b=8)+Ind; 
D= (bh — 4 f (p2)d)'?. (Note: May require complex arithmetic.) 
Step 4 If|b—D| <|b+D|thensetE=b+D 
else set E = b—D. 


Step 5 Seth =—2f(p2)/E; 
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Step 6 If |h| < TOL then 
Wa OUTPUT (p);_ (The procedure was successful.) 
STOP. 
Step 7 Set po =p\; (Prepare for next iteration.) 

Pl =P2; 
P2 =P; 
hy = Pi — Po; 
hy = p2 — pi; 


6) = (f(p) — f(Po))/h; 
52 = (f(p2) — f(pi))/ha; 
d = (52 — 6) /(hz + hy); 
i=i+l. 
Step 8 OUTPUT (‘Method failed after No iterations, No =’, No); 


(The procedure was unsuccessful.) 
STOP. a 


Illustration Consider the polynomial f (x) = x* — 3x7 + x? +x + 1, part of whose graph is shown in 
Figure 2.14. 


Figure 2.14 
Three sets of three initial points will be used with Algorithm 2.8 and TOL = 10~> to 
approximate the zeros of f. The first set will use pp = 0.5, p) = —O0.5, and p2 = 0. The 
parabola passing through these points has complex roots because it does not intersect the 
x-axis. Table 2.12 gives approximations to the corresponding complex zeros of /f. 
Table 2.12 P= 0S: BSNS. B= 
i Pi f (Pi) 
3 —0.100000 + 0.8888 197 —0.01120000 + 3.014875548i 
4 —0.492146 + 0.44703 1i —0.1691201 — 0.736733 1502i 
5 —0.352226 + 0.484132i —0.1786004 + 0.0181872213i 
6 —0.340229 + 0.443036 0.01197670 — 0.0105562185i 
7 —0.339095 + 0.446656: —0.0010550 + 0.000387261i 
8 —0.339093 + 0.446630i 0.000000 + 0.000000: 
9 —0.339093 + 0.446630i 0.000000 + 0.000000; 
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Table 2.13 gives the approximations to the two real zeros of f. The smallest of these uses 
Po = 0.5, pi: = 1.0, and po = 1.5, and the largest root is approximated when pp = 1.5, 
Pi = 2.0, and po = 2.5. 


Table213 ,, 05, pp =1.0, p=15 po=ls, m=20, p=25 
i Di fp) i Pi f(pa 

3. «1.40637 — 0.04851 3 2.24733. «= — 0.24507 

4 1.38878 0.00174 4 2.28652: —0.01446 

5 1.38939 0.00000 5 2.28878 += —0,00012 

6 1.38939 0.00000 6 2.28880 0.00000 

7 2.28879 0.00000 


The values in the tables are accurate approximations to the places listed. 


We used Maple to generate the results in Table 2.12. To find the first result in the table, 
define f(x) with 


f= toe = 30 oe bef 1 
Then enter the initial approximations with 
pO := 0.5; p1 := —0.5; p2 := 0.0 
and evaluate the function at these points with 
FO := f(p0); fl = f(pl); f2 = f(p2) 
To determine the coefficients a, b, c, and the approximate solution, enter 
ef 2; 
((p0 = p2)? - (f1 = f2) - (pl = p2)? - (f0 = f2)) 
(p0 — p2) - (pl — p2) - (p0 — pl) 
je KPL= 2) (f0 = f2) = (pO = p2)-(f1 = £2) 
(p0 — p2) - (pl — p2) - (p0 — pl) 
2c 


b+ (alg) VP —4a-e 


This produces the final Maple output 


b= 


p3 := p2- 


—0.1000000000 + 0.8888194418/ 
and evaluating at this approximation gives f(p3) as 
—0.0112000001 + 3.0148755487 


This is our first approximation, as seen in Table 2.12. 

The illustration shows that Miiller’s method can approximate the roots of polynomials 
with a variety of starting values. In fact, Miiller’s method generally converges to the root of a 
polynomial for any initial approximation choice, although problems can be constructed for 
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which convergence will not occur. For example, suppose that for some i we have f (pj) = 
S (i+) = f(Pis+2) F O. The quadratic equation then reduces to a nonzero constant 
function and never intersects the x-axis. This is not usually the case, however, and general- 
purpose software packages using Miiller’s method request only one initial approximation 
per root and will even supply this approximation as an option. 


EXERCISE SET 26 


1. 


Find the approximations to within 10~* to all the real zeros of the following polynomials using 
Newton’s method. 


a f@) =x -2x°-5 

b f(@) =x+3x?-1 

ce f@=x7-x-1 

d. f() =2x44+2x?-—x-—3 

e f(x) =x° + 4.001? + 4.002x + 1.101 
fo f@) =x —xt423-3x? 42-4 


Find approximations to within 10~> to all the zeros of each of the following polynomials by first 
finding the real zeros using Newton’s method and then reducing to polynomials of lower degree to 
determine any complex zeros. 


af (x) = x4 4+. 5x3 — 9x? — 85x — 136 

b. f(x) = x4 — 2x3 — 12x? + 16x — 40 

ce f(x) =xt4534+ 3x7 4+ 2x42 

d. f(x) = x9 4+ 1lxt — 21x — 10x? — 21x —5 
e. f(x) = 16x* + 88x3 + 159x? + 76x — 240 
f. f(x) = x4 — 4x? — 3x45 

g f(x) =xt- 2x9 — 4x7 +4044 

h. f(x) =x3 — 7x? + 14x-6 


Repeat Exercise 1 using Miiller’s method. 
Repeat Exercise 2 using Miiller’s method. 


Use Newton’s method to find, within 10~, the zeros and critical points of the following functions. 
Use this information to sketch the graph of /. 


a. f(x) = x8 — 9x? 4+ 12 b. fe) =x4 — 2x3 — 5x7 4 12x —5 


f (x) = 10x? — 8.3x? + 2.295x — 0.21141 = 0 has a root at x = 0.29. Use Newton’s method with an 
initial approximation x9 = 0.28 to attempt to find this root. Explain what happens. 


Use Maple to find a real zero of the polynomial f(x) = x? + 4x — 4. 
Use Maple to find a real zero of the polynomial f(x) = x* — 2x — 5. 


Use each of the following methods to find a solution in [0.1, 1] accurate to within 10~* for 
600x* — 550x* + 200x” — 20x — 1 = 0. 


a. Bisection method ec. Secant method e. Miiller’s method 
b. Newton’s method d. method of False Position 
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10. Two ladders crisscross an alley of width W. Each ladder reaches from the base of one wall to some 
point on the opposite wall. The ladders cross at a height H above the pavement. Find W given that 
the lengths of the ladders are x; = 20 ft and x. = 30 ft, and that H = 8 ft. 


mal 


11. Aan in the shape of a right circular cylinder is to be constructed to contain 1000 cm?. The circular 
top and bottom of the can must have a radius of 0.25 cm more than the radius of the can so that the 
excess can be used to form a seal with the side. The sheet of material being formed into the side of 
the can must also be 0.25 cm longer than the circumference of the can so that a seal can be formed. 
Find, to within 10~*, the minimal amount of material needed to construct the can. 


12. In 1224, Leonardo of Pisa, better known as Fibonacci, answered a mathematical challenge of John of 
Palermo in the presence of Emperor Frederick II: find a root of the equation x? + 2x? + 10x = 20. He 
first showed that the equation had no rational roots and no Euclidean irrational root—that is, no root 


in any of the forms a+ Vb, Jat Vb, Vat Jb, ory/ Jat Vb, where a and b are rational numbers. 
He then approximated the only real root, probably using an algebraic technique of Omar Khayyam 
involving the intersection of a circle and a parabola. His answer was given in the base-60 number 
system as 


142(1)47(2 “44 ; 433 : er : "440 ay 
60 60 60 60 60 60) ° 


How accurate was his approximation? 
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| Sa 2.7 Survey of Methods and Software 


In this chapter we have considered the problem of solving the equation f(x) = 0, where 
f is a given continuous function. All the methods begin with initial approximations and 
generate a sequence that converges to a root of the equation, if the method is successful. 
If [a,b] is an interval on which f(a) and f(b) are of opposite sign, then the Bisection 
method and the method of False Position will converge. However, the convergence of these 
methods might be slow. Faster convergence is generally obtained using the Secant method 
or Newton’s method. Good initial approximations are required for these methods, two for 
the Secant method and one for Newton’s method, so the root-bracketing techniques such 
as Bisection or the False Position method can be used as starter methods for the Secant or 
Newton’s method. 

Miiller’s method will give rapid convergence without a particularly good initial approx- 
imation. It is not quite as efficient as Newton’s method; its order of convergence near a root 
is approximately a = 1.84, compared to the quadratic, a = 2, order of Newton’s method. 
However, it is better than the Secant method, whose order is approximately aw = 1.62, and 
it has the added advantage of being able to approximate complex roots. 

Deflation is generally used with Miiller’s method once an approximate root of a poly- 
nomial has been determined. After an approximation to the root of the deflated equation has 
been determined, use either Miiller’s method or Newton’s method in the original polynomial 
with this root as the initial approximation. This procedure will ensure that the root being 
approximated is a solution to the true equation, not to the deflated equation. We recom- 
mended Miiller’s method for finding all the zeros of polynomials, real or complex. Miiller’s 
method can also be used for an arbitrary continuous function. 

Other high-order methods are available for determining the roots of polynomials. If 
this topic is of particular interest, we recommend that consideration be given to Laguerre’s 
method, which gives cubic convergence and also approximates complex roots (see [Ho], 
pp. 176-179 for a complete discussion), the Jenkins-Traub method (see [JT]), and Brent’s 
method (see [Bre]). 

Another method of interest, Cauchy’s method, is similar to Miiller’s method but avoids 
the failure problem of Miiller’s method when f(x;) = f(@i+1) = f(@i+2), for some i. For 
an interesting discussion of this method, as well as more detail on Miiller’s method, we 
recommend [YG], Sections 4.10, 4.11, and 5.4. 

Given a specified function f and a tolerance, an efficient program should produce an 
approximation to one or more solutions of f(x) = 0, each having an absolute or relative 
error within the tolerance, and the results should be generated in a reasonable amount 
of time. If the program cannot accomplish this task, it should at least give meaningful 
explanations of why success was not obtained and an indication of how to remedy the cause 
of failure. 

IMSL has subroutines that implement Miiller’s method with deflation. Also included 
in this package is a routine due to R. P. Brent that uses a combination of linear interpolation, 
an inverse quadratic interpolation similar to Miiller’s method, and the Bisection method. 
Laguerre’s method is also used to find zeros of areal polynomial. Another routine for finding 
the zeros of real polynomials uses a method of Jenkins-Traub, which is also used to find 
zeros of a complex polynomial. 

The NAG library has a subroutine that uses a combination of the Bisection method, 
linear interpolation, and extrapolation to approximate a real zero of a function on a 
given interval. NAG also supplies subroutines to approximate all zeros of a real poly- 
nomial or complex polynomial, respectively. Both subroutines use a modified Laguerre 
method. 
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The netlib library contains a subroutine that uses a combination of the Bisection and 
Secant method developed by T. J. Dekker to approximate a real zero of a function in the 
interval. It requires specifying an interval that contains a root and returns an interval with 
a width that is within a specified tolerance. Another subroutine uses a combination of the 
bisection method, interpolation, and extrapolation to find a real zero of the function on the 
interval. 

MATLAB has a routine to compute all the roots, both real and complex, of a polynomial, 
and one that computes a zero near a specified initial approximation to within a specified 
tolerance. 

Notice that in spite of the diversity of methods, the professionally written packages 
are based primarily on the methods and principles discussed in this chapter. You should be 
able to use these packages by reading the manuals accompanying the packages to better 
understand the parameters and the specifications of the results that are obtained. 

There are three books that we consider to be classics on the solution of nonlinear 
equations: those by Traub [Tr], by Ostrowski [Os], and by Householder [Ho]. In addition, 
the book by Brent [Bre] served as the basis for many of the currently used root-finding 
methods. 
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Interpolation and Polynomial Approximation 


Introduction 


A census of the population of the United States is taken every 10 years. The following 
table lists the population, in thousands of people, from 1950 to 2000, and the data are also 
represented in the figure. 


Year 1950 1960 1970 1980 1990 2000 
Population 151,326 179,323 203,302 226,542 249,633 281,422 
(in thousands) 
Pt) 4 
3 X 108 + 
e 
e 
e 
2x 108 4 : 
8 : 
3 e 
a 
lo) 
a 
1x 108 + 
t t t t t t > 
1950 1960 1970 1980 1990 2000 ¢ 
Year 


In reviewing these data, we might ask whether they could be used to provide a rea- 
sonable estimate of the population, say, in 1975 or even in the year 2020. Predictions of 
this type can be obtained by using a function that fits the given data. This process is called 
interpolation and is the subject of this chapter. This population problem is considered 
throughout the chapter and in Exercises 18 of Section 3.1, 18 of Section 3.3, and 28 of 


Section 3.5. 
105 
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| Sa 3.1 Interpolation and the Lagrange Polynomial 


Figure 3.1 


Theorem 3.1 


Karl Weierstrass (1815-1897) is 
often referred to as the father of 
modern analysis because of his 
insistence on rigor in the 
demonstration of mathematical 
results. He was instrumental in 
developing tests for convergence 
of series, and determining ways 
to rigorously define irrational 
numbers. He was the first to 
demonstrate that a function could 
be everywhere continuous but 
nowhere differentiable, a result 
that shocked some of his 
contemporaries. 


One of the most useful and well-known classes of functions mapping the set of real numbers 
into itself is the algebraic polynomials, the set of functions of the form 


Py (2) = Gy" + yx"! + +++ + ax + ao, 


where n is a nonnegative integer and do,...,a, are real constants. One reason for their 
importance is that they uniformly approximate continuous functions. By this we mean that 
given any function, defined and continuous on a closed and bounded interval, there exists 
a polynomial that is as “close” to the given function as desired. This result is expressed 
precisely in the Weierstrass Approximation Theorem. (See Figure 3.1.) 


y=f(x) +e 
7“ y=P(x) 
y =f (x) 


Ue y= fix) —€ 


(Weierstrass Approximation Theorem) 


Suppose that f is defined and continuous on [a, b]. Foreache > 0, there exists a polynomial 
P(x), with the property that 


| f(x) — P(x)| <¢, forall x in [a, dD]. | 


The proof of this theorem can be found in most elementary texts on real analysis (see, 
for example, [Bart], pp. 165-172). 

Another important reason for considering the class of polynomials in the approximation 
of functions is that the derivative and indefinite integral of a polynomial are easy to determine 
and are also polynomials. For these reasons, polynomials are often used for approximating 
continuous functions. 

The Taylor polynomials were introduced in Section 1.1, where they were described 
as one of the fundamental building blocks of numerical analysis. Given this prominence, 
you might expect that polynomial interpolation would make heavy use of these functions. 
However this is not the case. The Taylor polynomials agree as closely as possible with 
a given function at a specific point, but they concentrate their accuracy near that point. 
A good interpolation polynomial needs to provide a relatively accurate approximation 
over an entire interval, and Taylor polynomials do not generally do this. For example, 
suppose we calculate the first six Taylor polynomials about x» = 0 for f(x) = e’. 
Since the derivatives of f(x) are all e*, which evaluated at x9 = 0 gives 1, the Taylor 
polynomials are 
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Very little of Weierstrass’s work 
was published during his lifetime, 
but his lectures, particularly on 
the theory of functions, had 
significant influence on an entire 
generation of students. 


Figure 3.2 


Table 3.1 
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x x x? 

Po@®)=1, Pi@)=1+x, Pox)=1+xt+ a Pix) =l4+x+> +5, 
x2 x3 x4 xr x4 x5 

P. =1 —+—4-—, d P = 1 . 
4) =1tat a + + aay an SMa itat a tet oat tap 


The graphs of the polynomials are shown in Figure 3.2. (Notice that even for the 
higher-degree polynomials, the error becomes progressively worse as we move away from 
zero.) 


Although better approximations are obtained for f(x) = e* if higher-degree Taylor 
polynomials are used, this is not true for all functions. Consider, as an extreme example, 
using Taylor polynomials of various degrees for f(x) = 1/x expanded about x) = | to 
approximate f(3) = 1/3. Since 


foax, fOaSa 4 OH 2s, 
and, in general, 
F@@ = (Dk |, 


the Taylor polynomials are 


n (k) 1 n 
Pa) = Ow f= DepMe— 1. 
k=0 


k! 
k=0 


To approximate f(3) = 1/3 by P,,(3) for increasing values of n, we obtain the values in 
Table 3.1—rather a dramatic failure! When we approximate f (3) = 1/3 by P,,(3) for larger 
values of n, the approximations become increasingly inaccurate. 


n | oO |] 1 | 2 | 3 [| 4 | 5 | 6 [| 7 
pa | ti] —t |e | ase) aa on |e: | es 
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Example 1 


Interpolation and Polynomial Approximation 


For the Taylor polynomials all the information used in the approximation is concentrated 
at the single number x9, so these polynomials will generally give inaccurate approximations 
as we move away from xo. This limits Taylor polynomial approximation to the situation in 
which approximations are needed only at numbers close to x9. For ordinary computational 
purposes it is more efficient to use methods that include information at various points. We 
consider this in the remainder of the chapter. The primary use of Taylor polynomials in 
numerical analysis is not for approximation purposes, but for the derivation of numerical 
techniques and error estimation. 


Lagrange Interpolating Polynomials 


The problem of determining a polynomial of degree one that passes through the distinct 
points (Xp, yo) and (x;, y;) is the same as approximating a function f for which f(x) = yo 
and f(x) = y, by means of a first-degree polynomial interpolating, or agreeing with, the 
values of f at the given points. Using this polynomial for approximation within the interval 
given by the endpoints is called polynomial interpolation. 
Define the functions 
x—X x—X0 


and L)(x) = . 
Xo — xX) X1 — Xo 


L(x) = 
The linear Lagrange interpolating polynomial through (xo, yo) and (x1, y1) is 


P(x) = Lo@) f Go) + Li@)f 1) = 


f (x1). 


X—xX 
0 


~ f (xo) + 
x) 


X — Xo 
1 


Xx 


x1 — X9 
Note that 
Lo(%) =1, Loi) =9, LiQo)=90, and LiQ@) =1, 
which implies that 
P(xo) = 1+ f (xo) + 0- f(x1) = fo) = yo 
and 


P(x) = 0- fo) +1- fe) = f@) =y. 


So P is the unique polynomial of degree at most one that passes through (xo, yo) and 
(1,91). 
Determine the linear Lagrange interpolating polynomial that passes through the points (2, 4) 


and (5, 1). 


Solution In this case we have 


ieo— e296 s a oS 
xy= =--(x -— an x= = 7,X—-2), 
— 3 ; = a 
SO 
P(x) x 5) rete 2)-1 Fee ae : +6 
x)= - LX . ieee . = Xx Xx = a 
3 3 a re 
The graph of y = P(x) is shown in Figure 3.3. a 
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Figure 3.3 
y=P@)=—x +6 
To generalize the concept of linear interpolation, consider the construction of a poly- 
nomial of degree at most n that passes through the n + 1 points 
(Xo, f(%0)), G1, £1), - ++» Ons FAn))- 
(See Figure 3.4.) 
Figure 3.4 


In this case we first construct, for each k = 0,1,...,n, a function L,,(x) with the 
property that L,,(x;) = O when i # k and L, x. (x,) = 1. To satisfy L,4%(x;) = O for each 
i # k requires that the numerator of L,,(x) contain the term 


(x — x0) (X — X1) +++ (= XR 1) OH — Xe) + — Xn). 


To satisfy Ly%(x¢) = 1, the denominator of L,,,.(x) must be this same term but evaluated at 
x = x,. Thus 


(X= Xo) +++ & = Xe-1) % = He41) +++ & = Xn) 


(Xe — XO) + + Oe = Xk=1) KK — KKH) +++ OK = Xn) 


Link (x) = 


A sketch of the graph of a typical L,, (when 7 is even) is shown in Figure 3.5. 
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Figure 3.5 


Theorem 3.2 


The interpolation formula named 
for Joseph Louis Lagrange 
(1736-1813) was likely known 
by Isaac Newton around 1675, 
but it appears to first have been 
published in 1779 by Edward 
Waring (1736-1798). Lagrange 
wrote extensively on the subject 
of interpolation and his work had 
significant influence on later 
mathematicians. He published 
this result in 1795. 


The symbol | | is used to write 
products compactly and parallels 
the symbol )°, which is used for 
writing sums. 


Example 2 


Interpolation and Polynomial Approximation 


Lh) A 


The interpolating polynomial is easily described once the form of L,,, is known. This 
polynomial, called the nth Lagrange interpolating polynomial, is defined in the following 
theorem. 


If x9,X1,..-,X, are m + 1 distinct numbers and f is a function whose values are given at 
these numbers, then a unique polynomial P(x) of degree at most exists with 


f (XK) = P(xe), 
This polynomial is given by 


for eachk = 0,1,...,n. 


P(x) = fQo)Lno@) +--+ + fOn)Lnn@) = 2 SF (%K) Ln), (3.1) 


k=0 
where, for each k = 0,1,...,n, 


(x — x9) — x1) +++ = XR) — Xe) + & — Xn) 
(XK — X0) (Xk = X1) ++ + OK = XE-1) KK — KEG) ++ KK = Xn) 


Link (x) > 


n 


(x — x;) 
= I] acca a rT 
wg Xk — Xi) 
i#k 
We will write L,,.(x) simply as L;(x) when there is no confusion as to its degree. 
(a) Use the numbers (called nodes) x9 = 2, x; = 2.75, and x. = 4 to find the second 
Lagrange interpolating polynomial for f(x) = 1/x. 
(b) Use this polynomial to approximate f(3) = 1/3. 


Solution (a) We first determine the coefficient polynomials Lo(x), Li) (x), and Ly(x). In 
nested form they are 


(@—2.75)\%—-4) 2 


LO = GoaHQ~H = 37 27IDE-4: 
—  @-De-4) 
MO) = Gas_nQ7s—4 ~ 15% YO-4 
and 
ip OO) aaa. 


(4-2)(4-2.5) 5 
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Also, f (xo) = f(2) = 1/2, fx) = f(2.75) = 4/11, and fQ@2) = f(4) = 1/4, so 


2 
P(x) = DO Fa )Le@) 


k=0 


1 64 1 
= 3 2.75) — 4) — FeO — DOE 4) + Foe WOH 2.75) 


1, 35. 49 


= 39" ~ 38° 1 ag 


(b) An approximation to f(3) = 1/3 (see Figure 3.6) is 


9 105 49 29 
3) © P(3) = - a & 0.32955. 
FG) @) 22 ~=88 os 44 88 
Recall that in the opening section of this chapter (see Table 3.1) we found that no Taylor 
polynomial expanded about x9 = | could be used to reasonably approximate f(x) = 1/x 
atx = 3. a 


The interpolating polynomial P of degree less than or equal to 3 is defined in Maple 
with 


P := x — interp([2, 11/4, 4], [1/2, 4/11, 1/4], x) 


; 11 141 
x — interp | |2,—,4|,| =,—.,-—=]|.x 
4 2 11°4 


To see the polynomial, enter 


P(x) 


22. 88 * 44 
Evaluating P(3) as an approximation to f(3) = 1/3, is found with 
evalf(P(3)) 


0.3295454545 
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Theorem 3.3 


There are other ways that the 
error term for the Lagrange 
polynomial can be expressed, but 
this is the most useful form and 
the one that most closely agrees 
with the standard Taylor 
polynomial error form. 


Interpolation and Polynomial Approximation 


The interpolating polynomial can also be defined in Maple using the CurveFitting package 
and the call PolynomialInterpolation. 

The next step is to calculate a remainder term or bound for the error involved in 
approximating a function by an interpolating polynomial. 


Suppose xo,X1,...,X, are distinct numbers in the interval [a,b] and f € C"*!Ta, b]. Then, 
for each x in [a, b], a number &(x) (generally unknown) between x0, x1,...,Xn, and hence 
in (a, b), exists with 


(n+1) 
Fa) = PCa) + RO Ge — aay r =) + a, G.3) 
where P(x) is the interpolating polynomial given in Eq. (3.1). a 


Proof Note first that if x = x,, for any k = 0,1,...,n, then f(x.) = P(x;,), and choosing 
&(x,) arbitrarily in (a, b) yields Eq. (3.3). 
Ifx A xx, for all k = 0,1,...,”, define the function g for ¢ in [a, b] by 


a) = fO — PO — LF) — P@] Oe) & 


(x _ Xo) (x _— x1) a (x a Xn) 


= f)-P®-(f@-PoI]] i 


i=0 


Since f € C"t'[a,b], and P € C®[a, b], it follows that g € C"*'[a, b]. For t = x,, we have 


n 


gO) = Fx) — Po) — LF) — POT | = = 0- LF — Pw]-0=0. 
wg & —*) 
Moreover, 
“| (x — xi) 
8) = Fe) — Pe) —LF@) ~ PONT] G5 = fF — PW — LF@) — P@I=0. 
i=0 ' 


Thus g € C"*![a, b], and g is zero at the n + 2 distinct numbers x, x0, x1,...,Xn. By 
Generalized Rolle’s Theorem 1.10, there exists a number é in (a, b) for which gt) (€) = 0. 
So 


a! | a —4i) 
O=gOtD(E) =f" &) — PPE) -[f@ -P@I—| |] . (4) 
dt'+! | 4 (x — x;) : 
i= t= 
However P(x) is a polynomial of degree at most n, so the (n+ 1)st derivative, P’*+" (x), 
is identically zero. Also, [Eal¢ — x;)/(x — x;)] is a polynomial of degree (n + 1), so 


n 


(t — x;) = 1 ntl : 
I] = ll J t""” + (lower-degree terms in f), 


ea (x — x; 


and 


get baa) (n+1)! 


dt’! ote (x—x)) [Teo - x1) 
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Equation (3.4) now becomes 


(n+ 1)! 
— fiat) 
0= fME) —0- LF) — PW Ca 
and, upon solving for f(x), we have 
(a+) E 
fo) = P@) + © TJ - 2. a2 


(n+ 1)! 5 
The error formula in Theorem 3.3 is an important theoretical result because Lagrange 
polynomials are used extensively for deriving numerical differentiation and integration 
methods. Error bounds for these techniques are obtained from the Lagrange error formula. 
Note that the error form for the Lagrange polynomial is quite similar to that for the Tay- 
lor polynomial. The nth Taylor polynomial about x9 concentrates all the known information 
at xo and has an error term of the form 


FEY EQ) 
(n+ 1)! 
The Lagrange polynomial of degree n uses information at the distinct numbers x9, x),..., 


xX, and, in place of (x — xo)”, its error formula uses a product of the n + | terms (x — xo), 
(x — X1),---,(% — Xn): 


(x _ xo)! : 


(n+1) 
: (n aw (x — Xo) (% — X1) +++ & — Xn). 


Example 3 In Example 2 we found the second Lagrange polynomial for f(x) = 1/x on [2, 4] using the 
nodes x9 = 2, x} = 2.75, and x2 = 4. Determine the error form for this polynomial, and 
the maximum error when the polynomial is used to approximate f (x) for x ¢ [2,4]. 


Solution Because f(x) = x~', we have 
fio =—x?, f"(~) =2x73, and f(x) = —6x7+. 
As a consequence, the second Lagrange polynomial has the error form 


f"EQ)) 
3! 


The maximum value of (£(x))~* on the interval is 2-* = 1 /16. We now need to determine 
the maximum value on this interval of the absolute value of the polynomial 


(x—xo)(x—x1)(4—x2) = —(E(X)) 4 (4-2) (42.75) (x—4), for E(x) in (2,4). 


, 35, 49 
g(x) = (*-—2)4—-—2.75)(x -—4) =x - ri + Tie — 22. 
Because 
rae: 35 242 12) = 3x2 35 gs we (x —7) 
el x 4 x 5 x = 3% 5 x 7 5 x x : 
the critical points occur at 
ok 7 25 aN os 7 9 
x=-, wihg{|-]=—, and x=-, wihg|~)=-—. 
3 3 108 2 2 16 
Hence, the maximum error is 
w 1 9 
CL a Ce Ce ee | =| = <5 © 0.00586. : 


3! ~ 16-6 
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The next example illustrates how the error formula can be used to prepare a table of 
data that will ensure a specified interpolation error within a specified bound. 


Example 4 Suppose a table is to be prepared for the function f(x) = e*, for x in [0, 1]. Assume the 
number of decimal places to be given per entry is d > 8 and that the difference between 
adjacent x-values, the step size, is h. What step size h will ensure that linear interpolation 
gives an absolute error of at most 10~° for all x in [0, 1]? 


Solution Let xo,x1,... be the numbers at which f is evaluated, x be in [0,1], and suppose 
j satisfies x; < x < x41. Eq. (3.3) implies that the error in linear interpolation is 


Q) 2) 
| f(x) — P@)| = Lt Xj )(X — Xj41)) = ud SG x I] — xj+1)I- 


The step size is h, so xj = jh, xj41 = (i + Ih, and 


If) — P@)| s 


2) 
zi se jh — G+ Dh). 


Hence 


MaX¢¢€[0,1] é 


LF) — Pog] <I max |x — jh x — G+ DA) 
<< max |(x—jh)(e— G+ DA). 
2 XjSXSj41 


Consider the function g(x) = (« — jh)(x — G+ IA), for jh < x < G+ Dh. Because 


J) == G+ DI) +=) =2(x—jh—3), 


the only critical point for g is at x = jh + h/2, with g(jh + h/2) = (h/2)* = h*/4. 
Since g(jh) = 0 and g(G + 1h) = 0, the maximum value of |g'(x)| in [/h, G + 1A] 
must occur at the critical point which implies that 


e h2 eh2 
| f(x) — P(x)| < 3 max |g(x)| < z 


e 
Xj SXSXj41 2 
Consequently, to ensure that the the error in linear interpolation is bounded by 10~°, it is 


sufficient for to be chosen so that 


eh? =§ ‘70h ‘ 3 
= < 10°. Thisimplies that h<1.72x 107. 


Because n = (1 — 0)/h must be an integer, a reasonable choice for the step size is 
h = 0.001. a 


EXERCISE SET 3.1 


1. For the given functions f (x), let x» = 0, x; = 0.6, and x. = 0.9. Construct interpolation polynomials 
of degree at most one and at most two to approximate f (0.45), and find the absolute error. 


a. f(x) =cosx e« fa) =Inw+1) 
bh f@=VJ1l4+x d. f(x) =tanx 
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2. For the given functions f(x), let xp = 1,x; = 1.25, and x. = 1.6. Construct interpolation polynomials 
of degree at most one and at most two to approximate f (1.4), and find the absolute error. 


a. f(x) =sinax ec f(x) = log, )(3x — 1) 
bh f@=V7x-1 d. f@w=e*—-x 


3. Use Theorem 3.3 to find an error bound for the approximations in Exercise 1. 
4. Use Theorem 3.3 to find an error bound for the approximations in Exercise 2. 


5. Use appropriate Lagrange interpolating polynomials of degrees one, two, and three to approximate 
each of the following: 


a. f (8.4) if f(8.1) = 16.94410, (8.3) = 17.56492, f(8.6) = 18.50515, f(8.7) = 18.82091 


b. f (—4) if f(—0.75) = —0.07181250, f(—0.5) = —0.02475000, f(—0.25) = 0.33493750, 
f (0) = 1.10100000 


c.f (0.25) if f(0.1) = 0.62049958, f(0.2) = —0.28398668, f(0.3) = 0.00660095, f(0.4) = 
0.24842440 


d. (0.9) if f(0.6) = —0.17694460, f(0.7) = 0.01375227, f (0.8) = 0.22363362, f(1.0) = 
0.65809197 


6. Use appropriate Lagrange interpolating polynomials of degrees one, two, and three to approximate 
each of the following: 


a. (0.43) if f(0) = 1, f(0.25) = 1.64872, f (0.5) = 2.71828, f (0.75) = 4.48169 
b. f (0) if f (—0.5) = 1.93750, f(—0.25) = 1.33203, f (0.25) = 0.800781, f(0.5) = 0.687500 


c. (0.18) if f(0.1) = —0.29004986, f (0.2) = —0.56079734, f (0.3) = —0.81401972, f (0.4) = 
—1.0526302 


d. (0.25) if f(—1) = 0.86199480, f(—0.5) = 0.95802009, f(0) = 1.0986123, f(0.5) = 
1.2943767 


7. The data for Exercise 5 were generated using the following functions. Use the error formula to find a 
bound for the error, and compare the bound to the actual error for the cases n = 1 andn = 2. 


a f(x) =xInx 
b. f(x) =x° + 4.001x? + 4.002x + 1.101 
ce. f(x) =xcosx—2x7+3x—-1 
d. f(x) = sin(e* — 2) 

8. The data for Exercise 6 were generated using the following functions. Use the error formula to find a 
bound for the error, and compare the bound to the actual error for the cases n = | andn = 2. 
a. f(x) =e* 
b f@)=xt-4x?-x41 
«e f@= x? cosx — 3x 
d. f(x) = In(e* + 2) 

9. Let P3(x) be the interpolating polynomial for the data (0, 0), (0.5, y), (1, 3), and (2, 2). The coefficient 
of x? in P3(x) is 6. Find y. 

10. Let f(@%) = x — x? and P>(x) be the interpolation polynomial on x) = 0, x; and x. = 1. Find the 

largest value of x, in (0, 1) for which f (0.5) — P2(0.5) = —0.25. 


11. Use the following values and four-digit rounding arithmetic to construct a third Lagrange polyno- 
mial approximation to f (1.09). The function being approximated is f(x) = log,)(tanx). Use this 
knowledge to find a bound for the error in the approximation. 


f (1.00) = 0.1924 f(1.05) = 0.2414 f(1.10) = 0.2933 f(1.15) = 0.3492 


12. Use the Lagrange interpolating polynomial of degree three or less and four-digit chopping arithmetic 
to approximate cos 0.750 using the following values. Find an error bound for the approximation. 


cos 0.698 = 0.7661  cos0.733 = 0.7432 cos0.768 = 0.7193 cos 0.803 = 0.6946 


The actual value of cos 0.750 is 0.7317 (to four decimal places). Explain the discrepancy between the 
actual error and the error bound. 
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13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Interpolation and Polynomial Approximation 


Construct the Lagrange interpolating polynomials for the following functions, and find a bound for 
the absolute error on the interval [xo, x,]. 


a f(x) =e*cos3x, x) = 0,x, = 0.3,x. = 0.6,n =2 

b. f(x) =sin(nx), x = 2.0,x, = 2.4,x%. = 2.6,n = 2 

« f@)=Inx, x» = 1.x, = 11x = 13,93 = 14,n =3 

d. f(x) =cosx+sinx, xo =0,x; = 0.25, 2x2 = 0.5,x3 = 1.0,n = 3 

Let f(x) =e’, for0 < x < 2. 

a. Approximate f (0.25) using linear interpolation with x) = 0 and x, = 0.5. 


b. Approximate f (0.75) using linear interpolation with x) = 0.5 and x; = 1. 

c. Approximate f (0.25) and f (0.75) by using the second interpolating polynomial with x9 = 0, 
x, = 1, and x, =2. 

d. Which approximations are better and why? 

Repeat Exercise 11 using Maple with Digits set to 10. 

Repeat Exercise 12 using Maple with Digits set to 10. 


Suppose you need to construct eight-decimal-place tables for the common, or base-10, logarithm 
function from x = 1 to x = 10 in such a way that linear interpolation is accurate to within 10~°. 
Determine a bound for the step size for this table. What choice of step size would you make to ensure 
that x = 10 is included in the table? 


a. The introduction to this chapter included a table listing the population of the United States from 
1950 to 2000. Use Lagrange interpolation to approximate the population in the years 1940, 1975, 
and 2020. 

b. The population in 1940 was approximately 132,165,000. How accurate do you think your 1975 
and 2020 figures are? 

It is suspected that the high amounts of tannin in mature oak leaves inhibit the growth of the winter 

moth (Operophtera bromata L., Geometridae) larvae that extensively damage these trees in certain 

years. The following table lists the average weight of two samples of larvae at times in the first 28 

days after birth. The first sample was reared on young oak leaves, whereas the second sample was 

reared on mature leaves from the same tree. 

a. Use Lagrange interpolation to approximate the average weight curve for each sample. 


b. Find an approximate maximum average weight for each sample by determining the maximum 
of the interpolating polynomial. 


Day | 0 6 | 10 | 13 | 17 | 20 | 28 
Sample 1 average weight (mg) | 6.67 | 17.33 | 42.67 | 37.33 | 30.10 | 29.31 | 28.74 
Sample 2 average weight (mg) | 6.67 | 16.11 | 18.89 | 15.00 | 10.56 | 9.44 | 8.89 


In Exercise 26 of Section 1.1 a Maclaurin series was integrated to approximate erf(1), where erf(x) is 
the normal distribution error function defined by 


2 aoe) 
erf(x) = al e' dt. 
0 


a. Use the Maclaurin series to construct a table for erf(x) that is accurate to within 10~* for erf(x;), 
where x; = 0.27, fori =0,1,...,5. 


b. Use both linear interpolation and quadratic interpolation to obtain an approximation to erf(4). 
Which approach seems most feasible? 


Prove Taylor’s Theorem 1.14 by following the procedure in the proof of Theorem 3.3. [Hint: Let 


(t _ Xo)?! 


(x = xp)tt! 2 


gt) = Ff) — PO) — Lf) — P@)- 


where P is the nth Taylor polynomial, and use the Generalized Rolle’s Theorem 1.10.] 
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22. Show that max |g(x)| = h?/4, where g(x) = (x — jh)(x-— G+ Ih). 
YASH 
23. The Bernstein polynomial of degree n for f € C[0, 1] is given by 
Ba) = 30 ("\¢ (£) ta — 
n = k n ’ 


where ( a denotes n!/k!(n — k)!. These polynomials can be used in a constructive proof of the 
Weierstrass Approximation Theorem 3.1 (see [Bart]) because lim B,(x) = f(x), for each x € [0, 1]. 
noo 


a. Find B3(x) for the functions 


i. fQ)=x i, f(x) =1 


b. Show that for each k < n, 
n—-1 _ k n 
k—-1)° \n k} 


c. Use part (b) and the fact, from (11) in part (a), that 


n 


l= Ss (i)sa —xy""* for each n, 


k=0 
to show that, for f(x) = x’, 

n-1\, 1 
B(x) = xo + -x. 
n n 


d. Use part (c) to estimate the value of n necessary for |B, (x) — | < 10~° to hold for all x in 
(0, 1]. 


| 3.2 Data Approximation and Neville's Method 


In the previous section we found an explicit representation for Lagrange polynomials and 
their error when approximating a function on an interval. A frequent use of these polynomials 
involves the interpolation of tabulated data. In this case an explicit representation of the 
polynomial might not be needed, only the values of the polynomial at specified points. In 
this situation the function underlying the data might not be known so the explicit form of 
the error cannot be used. We will now illustrate a practical application of interpolation in 
such a situation. 


Illustration Table 3.2 lists values of a function f at various points. The approximations to f (1.5) 
obtained by various Lagrange polynomials that use this data will be compared to try and 
determine the accuracy of the approximation. 

Table 3.2 
Bs f@) The most appropriate linear polynomial uses x9 = 1.3 and x; = 1.6 because 1.5 is between 
1.3 and 1.6. The value of the interpolating polynomial at 1.5 is 


1.0 0.7651977 


1.3 0.6200860 (1.5 — 1.6) (1.5 — 1.3) 

1.9 0.2818186 , ; : 
1516 15-1. 

2.2 0.1103623 =e) ) (0.620860) + C2229) ) (0.455402) = 0.5102968. 
(1.3 = 1.6) (1.6 — 1.3) 


Two polynomials of degree 2 can reasonably be used, one with x) = 1.3, x1 = 1.6, and 
x2 = 1.9, which gives 
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Definition 3.4 


Example 1 


Interpolation and Polynomial Approximation 


P»(1.5) = (1.5 — 1.6)(1.5 — 1.9) (0.6200860) + (1.5 — 1.3)(1.5 — 1.9) (0.4554022) 
(1.3 — 1.6)(1.3 — 1.9) (1.6 — 1.3)(1.6 — 1.9) 
(1.5 — 1.3)(1.5 — 1.6) 


(1.9 — 1.3)(1.9 — 1.6) 


(0.2818186) = 0.5112857, 


and one with x) = 1.0, x, = 1.3, and x2 = 1.6, which gives P3(1.5) = 0.5124715. 

In the third-degree case, there are also two reasonable choices for the polynomial. One 
with xo = 1.3, x1 = 1.6, x2 = 1.9, and x3 = 2.2, which gives P3(1.5) = 0.5118302. 

The second third-degree approximation is obtained with xo = 1.0, x; = 1.3, x2 = 1.6, 
and x3 = 1.9, which gives P3(1.5) = 0.5118127. The fourth-degree Lagrange polynomial 
uses all the entries in the table. With x) = 1.0, x) = 1.3, x. = 1.6, x3 = 1.9, and x4 = 2.2, 
the approximation is P4(1.5) = 0.5118200. 

Because P3(1.5), P3(1.5), and P4(1.5) all agree to within 2 x 10-5 units, we expect 
this degree of accuracy for these approximations. We also expect P4(1.5) to be the most 
accurate approximation, since it uses more of the given data. 

The function we are approximating is actually the Bessel function of the first kind of 
order zero, whose value at 1.5 is known to be 0.5118277. Therefore, the true accuracies of 
the approximations are as follows: 


Pi (1.5) — f(1.5)| © 1.53 x 1073, 
P3(1.5) — f(1.5)| © 5.42 x 1074, 
P3(1.5) — f(.5)| © 6.44 x 10-4, 
P3(1.5) — f(1.5)| © 2.5 x 107%, 
P3(1.5) — f(.5)| © 1.50 x 1075, 


P4(1.5) — f (1.5)| © 7.7 x 107°. 


Although P3(1.5) is the most accurate approximation, if we had no knowledge of the actual 
value of f (1.5), we would accept P4(1.5) as the best approximation since it includes the 
most data about the function. The Lagrange error term derived in Theorem 3.3 cannot be 
applied here because we have no knowledge of the fourth derivative of f. Unfortunately, 
this is generally the case. 


Neville’s Method 


A practical difficulty with Lagrange interpolation is that the error term is difficult to apply, 
so the degree of the polynomial needed for the desired accuracy is generally not known 
until computations have been performed. A common practice is to compute the results 
given from various polynomials until appropriate agreement is obtained, as was done in 
the previous Illustration. However, the work done in calculating the approximation by the 
second polynomial does not lessen the work needed to calculate the third approximation; 
nor is the fourth approximation easier to obtain once the third approximation is known, 
and so on. We will now derive these approximating polynomials in a manner that uses the 
previous calculations to greater advantage. 


Let f be a function defined at xo,x;,x2,...,X,, and suppose that mm ,, m2, ..., mz are k 
distinct integers, with 0 < m; < n for each i. The Lagrange polynomial that agrees with 
f (x) at the k points Xin. Xing, ++ »Xm, 18 denoted Pin, mmy,...amy (0). a 


Suppose that x» = 1,x; = 2, x. = 3,x3 = 4, x4 = 6, and f(x) = e’. Determine the 
interpolating polynomial denoted P),4(x), and use this polynomial to approximate f (5). 
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Solution This is the Lagrange polynomial that agrees with f(x) at x} = 2, x. = 3, and 
x4 = 6. Hence 


Pp _ &—3)%—6) 5, &-2)G—6) 5, &%—2)%—3) « 
12,4) = e e . 
oe (2 — 3)(2— 6) (3 — 2)(3 — 6) (6 — 2)(6 — 3) 
So 


O20 22) 2. OS O= 8) 4g DS). 
(2 — 3) — 6) (3 — 2) — 6) (6 — 2)(6 — 3) 


fG) * PO) = 


1 1 
——_ oe tert 5° & 218.105. 


The next result describes a method for recursively generating Lagrange polynomial 
approximations. 


Theorem 3.5 Let f be defined at xo, x1,...,x,, and let x; and x; be two distinct numbers in this set. Then 


(x = Xj) Po... j—Ljtdenk 0) — (% = Xj) Pot... 1,8-41,....k OD) 


(xj — xj) 


PQ)y= 


is the kth Lagrange polynomial that interpolates f at the k + | points xo,x1,...,X,. a 


gecagh CAM Bf OJ dy hens 


and O(x) are polynomials of degree k — | or less, P(x) is of degree at most k. 
First note that O(x;) = f (x;), implies that 


P(x) = (xi — OG) — Gi -~ HOG) _ Gi = f(a) = FQ). 
J 


Xi Xj (Xj 


Similarly, since O(x;) = f (xj), we have P(x) = f(x). 
In addition, if 0 < r < k andr is neither i nor j, then Q(x,) = O(x;) = f(x,). So 


Gr — %)OGr) — Gr —x)OGr) _ Gi —%) 
PG) = ! = 7 f(r) = f@). 
Xi Xj (Xj _ xj) 
But, by definition, Po1,_¢(x) is the unique polynomial of degree at most k that agrees with 
f at xo,%1,...,x~. Thus, P = Po — Re = 8 @ 


Theorem 3.5 implies that the interpolating polynomials can be generated recursively. 
For example, we have 


1 1 
Po. = [(x — x9)P1 — (« — x1)Pol, Pin= L(x — x1)P2 — (x — x2) Pi), 
xX; — Xo x2 — xX] 


Poi2 = 7 L(x — x9) P12 — (* — x2) Po], 


— Xo 


and so on. They are generated in the manner shown in Table 3.3, where each row is completed 
before the succeeding rows are begun. 
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Table 3.3 


Eric Harold Neville (1889-1961) 
gave this modification of the 
Lagrange formula in a paper 
published in 1932.[N] 


Table 3.4 


Example 2 


Table 3.5 
x f (x) 


1.0 0.7651977 
1.3 0.6200860 
1.6 0.4554022 
1.9 0.2818186 
2.2 0.1103623 


Interpolation and Polynomial Approximation 


Xo Po 

xy Py Po. 

X2 P, Pi2 Poa 

X3 P3 P23 Pi23 Poi23 

X4 P, P34 Po34 P1934 P0123 


The procedure that uses the result of Theorem 3.5 to recursively generate interpolating 
polynomial approximations is called Neville’s method. The P notation used in Table 3.3 
is cumbersome because of the number of subscripts used to represent the entries. Note, 
however, that as an array is being constructed, only two subscripts are needed. Proceeding 
down the table corresponds to using consecutive points x; with larger i, and proceeding to 
the right corresponds to increasing the degree of the interpolating polynomial. Since the 
points appear consecutively in each entry, we need to describe only a starting point and the 
number of additional points used in constructing the approximation. 

To avoid the multiple subscripts, we let Q; ;(x), for 0 <j < i, denote the interpolating 
polynomial of degree j on the (j + 1) numbers x;_;, x;-j41,...,Xj-1,%;; that is, 


O77 = Pi-jiju,...i-1i- 


Using this notation provides the Q notation array in Table 3.4. 


Xo Po = Qoo 

x] P, =Qi0 Por = Qi 

X2 Pz = Qo0 Piz = Qa Poi2 = Q22 

X3 P3 = Q30 P23 = Q31 Pi23 = O32 Po123 = 033 

X4 Ps = Quo P34 = Qai P234 = Qo Pi234 = Qa3 Poi234 = Qaa 


Values of various interpolating polynomials at x = 1.5 were obtained in the Illustration at 
the beginning of the Section using the data shown in Table 3.5. Apply Neville’s method to 
the data by constructing a recursive table of the form shown in Table 3.4. 


Solution Let x9 = 1.0, x1) = 1.3, x2 = 1.6, x3 = 1.9, and x4 = 2.2, then Qoo = f(1.0), 
Qio = fU.3), Q2o = fU.6), O39 = fU.9), and Q49 = f (2.2). These are the five 
polynomials of degree zero (constants) that approximate f (1.5), and are the same as data 
given in Table 3.5. 

Calculating the first-degree approximation Q, ;(1.5) gives 


(x — X9)Q1,.0 — (« — x1) Qo,0 


Q,,(1.5) = 
X| — Xo 
_ Gs = 1.0)0i9— G5 = 13) 050 
_ 13-10 
5(0.62 — 0.2(0.7651977 
7 0.5(0.6200860) — 0.2(0.7651977) — 0.5233449. 
0.3 
Similarly, 
1.5 — 1.3)(0.4554022) — (1.5 — 1.6)(0.6200860 
Qo (1.5) = ( d¢ 7 2 3 ¢ ) = 0.5102968, 


Q3(1.5) = 0.5132634, and Q4)(1.5) = 0.5104270. 
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The best linear approximation is expected to be Q2,; because 1.5 is between x; = 1.3 
and x2 = 1.6. 
In a similar manner, approximations using higher-degree polynomials are given by 


1.5 — 1.0)(0.5102968) — (1.5 — 1.6)(0.5233449 
Q22(1.5) = \ M i u 0 ui ) = 0.5124715, 


Q32(1.5) = 0.5112857, and Q4(1.5) = 0.5137361. 


The higher-degree approximations are generated in a similar manner and are shown in 
Table 3.6. a 


Table3.6 16 0.7651977 
13 0.6200860 ~—-0.5233449 
16  0.4554022 ~—s 05102968 ~—S—«0.5 124715 
19  0.2818186 —«0.5132634. ~—Ss«0.5112857 ~——0.5118127 
22 0.1103623. ~—-0.5104270 ~—S «0.513761 0.5118302 0.511200 


If the latest approximation, Q4 4, was not sufficiently accurate, another node, x5, could 
be selected, and another row added to the table: 


Xs O59 O51 O52 O53 O54 Q55. 


Then Q44, Qs,4, and Qs5.5 could be compared to determine further accuracy. 
The function in Example 2 is the Bessel function of the first kind of order zero, whose 
value at 2.5 is —0.0483838, and the next row of approximations to f (1.5) is 


2.5  —0.0483838 0.4807699 0.5301984 0.5119070 0.5118430 0.5118277. 


The final new entry, 0.5118277, is correct to all seven decimal places. 

The NumericalAnalysis package in Maple can be used to apply Neville’s method for 
the values of x and f(x) = y in Table 3.6. After loading the package we define the data 
with 
xy := [[1.0, 0.7651977], [1.3, 0.6200860], [1.6, 0.4554022], [1.9, 0.2818186]] 

Neville’s method using this data gives the approximation at x = 1.5 with the command 
p3 := PolynomialInterpolation(xy, method = neville, extrapolate = [1.5]) 
The output from Maple for this command is 
POLYINTERP(([1.0, 0.765 1977], [1.3, 0.6200860], [1.6, 0.4554022], [1.9, 0.2818186]], 
method = neville, extrapolate = [1.5], INFO) 
which isn’t very informative. To display the information, we enter the command 


NevilleTable(p3, 1.5) 


and Maple returns an array with four rows and four columns. The nonzero entries corre- 
sponding to the top four rows of Table 3.6 (with the first column deleted), the zero entries 
are simply used to fill up the array. 

To add the additional row to the table using the additional data (2.2, 0.1103623) we 
use the command 
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Example 3 
Table 3.7 
i Xi Inx; 
0 2.0 0.6931 
1 22. 0.7885 
2 2.3 0.8329 
Table 3.8 


Interpolation and Polynomial Approximation 


p3a := AddPoint(p3, [2.2, 0.1103623]) 
and a new array with all the approximation entries in Table 3.6 is obtained with 


NevilleTable(p3a, 1.5) 


Table 3.7 lists the values of f(x) = In x accurate to the places given. Use Neville’s method 
and four-digit rounding arithmetic to approximate f (2.1) = In 2.1 by completing the Neville 
table. 


Solution Because x — x9 = 0.1, x — x, = —0.1, x — x. = —0.2, and we are given 
Qoo = 0.6931, O19 = 0.7885, and Q29 = 0.8329, we have 


A1= a [(0.1)0.7885 — (—0.1)0.6931] = ae = 0.7410 
; 0.2 0.2 
and 
Qo) = a [(—0.1)0.8329 — (—0.2)0.7885] = —_—— = 0.7441. 


The final approximation we can obtain from this data is 


1 0.2276 
Qo, = —~ [(0.1)0.7441 — (—0.2)0.7410] = ——— = 0.7420. 
; 0.3 0.3 
These values are shown in Table 3.8. | 
i Xj xX—Xj Qio Or Oi2 
0 2.0 0.1 0.6931 
1 22 —0.1 0.7885 0.7410 
2, 2.3 —0.2 0.8329 0.7441 0.7420 


In the preceding example we have f (2.1) = In2.1 = 0.7419 to four decimal places, 
so the absolute error is 


| f (2.1) — P2(2.1)| = [0.7419 — 0.7420] = 107+. 


However, f’(x) = 1/x, f(x) = —1/x?, and f(x) = 2/x°, so the Lagrange error formula 
(3.3) in Theorem 3.3 gives the error bound 
wt 2.1 
| f (2.1) — P2(2.1)| = o s v (x — x0) (X — x1) — x2) 
-| : (0.1)(—0.1)(—0.2) gO aaa i6 
~~ (3EQpy | 3 


Notice that the actual error, 10~*, exceeds the error bound, 8.3 x 107°. This apparent 
contradiction is a consequence of finite-digit computations. We used four-digit rounding 
arithmetic, and the Lagrange error formula (3.3) assumes infinite-digit arithmetic. This 
caused our actual errors to exceed the theoretical error estimate. 


© Remember: You cannot expect more accuracy than the arithmetic provides. 


Algorithm 3.1 constructs the entries in Neville’s method by rows. 
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Neville’s Iterated Interpolation 


To evaluate the interpolating polynomial P on the n + | distinct numbers xo, ... ,x, at the 
number x for the function f: 


INPUT numbers x, x9,.x1,...,%n,3; values f (x0), f(11),---, f &n) as the first column 


Qo, Q1,0,--+,Qno of OQ. 
OUTPUT the table Q with P(x) = Quan. 


Step 7 Fori=1,2,...,n 
forj = 1,2,...,i 
C= Xj) Qi, j-1 —(- Xi)Oj-1, j-1 


Xi Xi-j 


set Oi; = 


Step 2, OUTPUT (Q); 
STOP. a 


The algorithm can be modified to allow for the addition of new interpolating nodes. 
For example, the inequality 


101; — Qi-17-1| < € 


can be used as a stopping criterion, where ¢ is a prescribed error tolerance. If the inequality is 
true, Qj; is areasonable approximation to f (x). If the inequality is false, a new interpolation 
point, x;41, is added. 


EXERCISE SET 3.2 


1. 


Use Neville’s method to obtain the approximations for Lagrange interpolating polynomials of degrees 

one, two, and three to approximate each of the following: 

a. f (8.4) if f(8.1) = 16.94410, f(8.3) = 17.56492, f (8.6) = 18.50515, f(8.7) = 18.82091 

b. f (-§) if f(—0.75) = —0.07181250, f(—0.5) = —0.02475000, f(—0.25) = 0.33493750, 
f() = 1.10100000 

ce. f (0.25) if f(0.1) = 0.62049958, f(0.2) = —0.28398668, f(0.3) = 0.00660095, f(0.4) = 
0.24842440 

d. (0.9) if f(0.6) = —0.17694460, f(0.7) = 0.01375227, f(0.8) = 0.22363362, f(1.0) = 
0.65809197 


Use Neville’s method to obtain the approximations for Lagrange interpolating polynomials of degrees 
one, two, and three to approximate each of the following: 


a. f (0.43) if f() = 1, f(0.25) = 1.64872, f (0.5) = 2.71828, f(0.75) = 4.48169 
b. f (0) if f(—0.5) = 1.93750, f(—0.25) = 1.33203, f (0.25) = 0.800781, f (0.5) = 0.687500 


ce. f (0.18) if f(0.1) = —0.29004986, f (0.2) = —0.56079734, f (0.3) = —0.81401972, f (0.4) = 
—1.0526302 


d. (0.25) if f(—1) = 0.86199480, f(—0.5) = 0.95802009, f(0) = 1.0986123, f(0.5) = 
1.2943767 

Use Neville’s method to approximate /3 with the following functions and values. 

a. f(x) = 3° and the values x) = —2, x) = —1, x) = 0,43 = 1, and x4 = 2. 

b. f(x) = J/x and the values x9 = 0, x1 = 1, = 2,43 = 4, and x4 = 5. 

c. Compare the accuracy of the approximation in parts (a) and (b). 

Let P3(x) be the interpolating polynomial for the data (0,0), (0.5, y), (1,3), and (2, 2). Use Neville’s 

method to find y if P3(1.5) = 0. 
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5. Neville’s method is used to approximate f (0.4), giving the following table. 


Xo = 0 Po =1 

x; = 0.25 Pi, =2 Po = 2.6 

X2 = 0.5 P Pip Poag 

%3=0.75 P3=8 Po3=24 Pi23=2.96 Poi23 = 3.016 


Determine P, = f (0.5). 
6. Neville’s method is used to approximate f (0.5), giving the following table. 


xX = 0 Po =0 
y= 0.4 P, = 2.8 Poa =(35 
%=0.7 Py Pi Poiz= 7 
Determine P, = f (0.7). 
7. Suppose x; = j, for j = 0, 1, 2, 3 and it is known that 


Poi(x) =2r + 1, Po2(x) =x+ 1, and Pi23(2.5) => 3. 


Find Poi23 (2.5). 
8. Suppose x; = j, for j = 0, 1, 2, 3 and it is known that 


Poi(x) =x+ 1, Pia(x) = 3x—- 1, and Pi93(1.5) = 4. 


Find Poi23C1.5). 

9. Neville’s Algorithm is used to approximate f(0) using f(—2), f(—1), f(1), and f(2). Suppose 
f(—1) was understated by 2 and f(1) was overstated by 3. Determine the error in the original 
calculation of the value of the interpolating polynomial to approximate f (0). 

10. Neville’s Algorithm is used to approximate f(0) using f(—2), f(—1), fC), and f(2). Suppose 
f(—1) was overstated by 2 and f(1) was understated by 3. Determine the error in the original 
calculation of the value of the interpolating polynomial to approximate f (0). 

11. Construct a sequence of interpolating values y, to f(1 + 10), where f(x) = (1 +.x?)7! for 

—5 <x <5,as follows: For eachn = 1,2,...,10,leth = 10/n andy, = P,(1+ /10), where P,, (x) 
is the interpolating polynomial for f (x) at the nodes x”, x”, ...,x and ." = —5 + jh, for each 
j =0,1,2,...,n. Does the sequence {y,,} appear to converge to f(1 + 10)? 
Inverse Interpolation Suppose f € C![a,b], f’(x) 4 0 on [a,b] and f has one zero p in [a, b]. 
Let X9,...,X,, be nm + 1 distinct numbers in [a,b] with f(a.) = yx, for each k = 0,1,...,n. To 
approximate p construct the interpolating polynomial of degree n on the nodes yo,...,y, for f~!. 
Since y, = f(x) and 0 = f(p), it follows that f-'G,) = x and p = f—'(0). Using iterated 
interpolation to approximate f~!(0) is called iterated inverse interpolation. 

12. Use iterated inverse interpolation to find an approximation to the solution of x — e~* = 0, using the 
data 


e* | 0.740818 | 0.670320 | 0.606531 | 0.548812 


13. Construct an algorithm that can be used for inverse interpolation. 


| Sa 3.3 Divided Differences 


Iterated interpolation was used in the previous section to generate successively higher-degree 
polynomial approximations at a specific point. Divided-difference methods introduced in 
this section are used to successively generate the polynomials themselves. 
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As in so many areas, Isaac 
Newton is prominent in the study 
of difference equations. He 
developed interpolation formulas 
as early as 1675, using his A 
notation in tables of differences. 
He took a very general approach 
to the difference formulas, so 
explicit examples that he 
produced, including Lagrange’s 
formulas, are often known by 
other names. 
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Suppose that P,,(x) is the nth Lagrange polynomial that agrees with the function f at 
the distinct numbers x9, x1,...,X,. Although this polynomial is unique, there are alternate 
algebraic representations that are useful in certain situations. The divided differences of f 
with respect to x9,X1,...,X, are used to express P,,(x) in the form 


Pn(X) = ao + a1 (x — X0) + a2(x — X0)(% — X1) + +++ + G(X — X90) +++ — Xn-1), G.5) 


for appropriate constants do, a1,...,@,. To determine the first of these constants, ap, note 
that if P,,(x) is written in the form of Eq. (3.5), then evaluating P,,(x) at xo leaves only the 
constant term do; that is, 


ay = P, (xo) = f (Xo). 


Similarly, when P(x) is evaluated at x,, the only nonzero terms in the evaluation of 
P,,(x,) are the constant and linear terms, 


f (Xo) + a1 (41 — Xo) = Pr) = fr); 


Ne) 


fed = few) 


X1 — Xo 


(3.6) 


We now introduce the divided-difference notation, which is related to Aitken’s A? 
notation used in Section 2.5. The zeroth divided difference of the function f with respect 
to x;, denoted f[x;], is simply the value of f at x;: 


flxi] = f Gi). 


The remaining divided differences are defined recursively; the first divided difference 
of f with respect to x; and x;4; is denoted f[x;,x;+1] and defined as 


f [xi41] - fll 


(3.7) 


fL%, Xin] = (3.8) 
Xi+1 — Xi 
The second divided difference, f [x;, Xi+1,Xj+2], is defined as 
Sf [xin1, X42] — fDi, xi41] 
SF Xi, X41, X42] = . 
Xi42 — Xi 
Similarly, after the (k — 1)st divided differences, 
Sis Xi, Xia, ++ Xipe—-1] and f[Xi41,Xip2,-- + Kipe—1 Xitel, 
have been determined, the kth divided difference relative to X;, X41, Xi+2,.-->Xi+k 18 
Sit Nita, ++ Xitel — SOG X41, ++ Xie 1] 
Si Kips . -  Xi¢k—-15 Xie] = . (3.9) 
Xitk — Xi 
The process ends with the single nth divided difference, 
Fix, x2, cee Xn] —_ St l%o0, x1, cee Xn—1] 
St [%0,%1,---.X%n] = : 
Xn — X0 
Because of Eq. (3.6) we can write aj = f[x0,x1], just as do can be expressed as ay = 


Ff (0) = f [xo]. Hence the interpolating polynomial in Eq. (3.5) is 


P,(x) = flxo] + fl%0,%1]@ — x0) + dae — Xo) (x — x1) 


+--+ +a,(% — X90) (X — X1) +++ (HX — Xp_1). 
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As might be expected from the evaluation of ap and a, the required constants are 


a = f[X0,%1,%2,---5 Xx], 


for each k = 0,1,...,n. So P,,(x) can be rewritten in a form called Newton’s Divided- 
Difference: 


Py(x) = fol + >> fl%0,41,---.%e1@ — x0) +++ @ — e-1)- (3.10) 


k=1 


The value of f[xo,%1,...,X] is independent of the order of the numbers x, x1, ...,X,, aS 
shown in Exercise 21. 

The generation of the divided differences is outlined in Table 3.9. Two fourth and one 
fifth difference can also be determined from these data. 


Table 3.9 
First Second Third 
* f@) divided differences divided differences divided differences 
Xo f lx] eee 
_ flal= fool 
fl%o.%1)] = aaa 
x1 fix] fase [tal oe 
2 — Xo 
fl.) = fl) — fll Pie dna flx1.¥2,%3] — fx. x1 x2] 
ahi X3— Xo 
x2 flx2] floa.%2.%3] = Pearl = Pial 
3X 
flx2,%3] = fla] = fle) fie f[x2,.%3,X4] — fla, x2,%3] 
X3 — Xo ea 
x3 f [x3] flax.) = Pest aie) 
4 — X2 
Hem) ~ a S [X2, 3, X45 X5] = fUs, 455] = 1X2, ¥3,%4] 
X4 — X3 fle Fie A X5 — X2 
m fla] f 1X3, %4, X5] = a shes 
Flas] — flea] ciate: 
f[%4,%5] = aa 
x5 f [xs] 


Newton’s Divided-Difference Formula 


To obtain the divided-difference coefficients of the interpolatory polynomial P on the (n+ 1) 
distinct numbers xo, X1,...,%n for the function f: 


INPUT numbers x0,%1,...,%n3 values f (x0), f(@1),---, fn) aS Foo, Fio,---»Fno- 
OUTPUT the numbers Fo, Fii,---, Finn where 


n i-1 
P,(x) = Foo + ya [[@ —xj). (Fiz is f[xo,x1,..-,xi].) 
j=l j=0 


Step 1 Fori=1,2,...,n 
Forj = 1,2,...,i 
Fij-1 — Fi-ij-1 
Xj — Xi-j . 
Step 2 OUTPUT (Foo, Fi, his Fan); 
STOP. | 


set Fj; = (Fij = f[xi-j,---.*i]-) 
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Example 1 
Table 3.10 


x Ff) 


1.0 0.7651977 
1.3 0.6200860 
1.6 0.4554022 
1.9 0.2818186 
2.2 0.1103623 


Table 3.11 


3.3 Divided Differences 127 


The form of the output in Algorithm 3.2 can be modified to produce all the divided 
differences, as shown in Example 1. 


Complete the divided difference table for the data used in Example 1| of Section 3.2, and 
reproduced in Table 3.10, and construct the interpolating polynomial that uses all this data. 


Solution The first divided difference involving xp and x; is 


- 0,6200860 — 0.765197 
flex) = Ses fol _ = —0,4837057. 
X1 — Xo 1.3 — 1.0 


The remaining first divided differences are found in a similar manner and are shown in the 
fourth column in Table 3.11. 


i Xi SLX) Sf i-1,%/] S [Xi-2, X11, Xi] Sf LXi-3,- ++. Xi] S%i-4,..., xi] 
0 1.0 0.7651977 
—0.4837057 
1 1.3 0.6200860 —0.1087339 
—0.5489460 0.0658784 
2 1.6 0.4554022 —0.0494433 0.0018251 
—0.5786120 0.0680685 
3 1.9 0.2818186 0.0118183 
—0.5715210 


4 2.2 0.1103623 


The second divided difference involving xo, x1, and x2 is 


x2] — , —0.5489460 — (—0.4837057 
Fiaeie oe orl = ( ) — 0.108739. 
X2 — Xo 1.6 — 1.0 
The remaining second divided differences are shown in the 5th column of Table 3.11. 
The third divided difference involving xo, x1, x2, and x3 and the fourth divided difference 
involving all the data points are, respectively, 
SF x1, %2,%3] — f[x0, 41, x2] —0.0494433 — (—0.1087339) 


F[X0,%1, 2,3] = x3 — Xo = 1.9 =, 1.0 


= 0.0658784, 


and 


SF i%1, X2,%3,%4] — f[%0, 41, x2, x3] _ 0.0680685 — 0.0658784 
= 7 2.2=1.0 


f [X0, X1,X2,%3,X4] = 


= 0.0018251. 


All the entries are given in Table 3.11. 
The coefficients of the Newton forward divided-difference form of the interpolating 
polynomial are along the diagonal in the table. This polynomial is 


P4(x) = 0.7651977 — 0.4837057(x — 1.0) — 0.1087339(x — 1.0)(« — 1.3) 
+ 0.0658784(x — 1.0)(x — 1.3)( — 1.6) 
+ 0.0018251(% — 1.0) — 1.3)@ — 1.6)@ — 1.9). 


Notice that the value P4(1.5) = 0.5118200 agrees with the result in Table 3.6 for Example 
2 of Section 3.2, as it must because the polynomials are the same. a 
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Theorem 3.6 


Interpolation and Polynomial Approximation 


We can use Maple with the NumericalAnalysis package to create the Newton Divided- 
Difference table. First load the package and define the x and f(x) = y values that will be 
used to generate the first four rows of Table 3.11. 


xy := [[1.0, 0.7651977], [1.3, 0.6200860], [1.6, 0.4554022], [1.9, 0.2818186]] 
The command to create the divided-difference table is 
p3 := PolynomialInterpolation(xy, independentvar = ‘x’, method = newton) 
A matrix containing the divided-difference table as its nonzero entries is created with the 
DividedDifferenceTable(p3) 
We can add another row to the table with the command 
p4 := AddPoint(p3, [2.2, 0.1103623]) 


which produces the divided-difference table with entries corresponding to those in 
Table 3.11. 
The Newton form of the interpolation polynomial is created with 


Interpolant(p4) 


which produces the polynomial in the form of P4(x) in Example 1, except that in place of 
the first two terms of P4(x): 


0.7651977 — 0.4837057(x — 1.0) 


Maple gives this as 1.248903367 — 0.4837056667x. 
The Mean Value Theorem 1.8 applied to Eq. (3.8) when i = 0, 


= f (x1) — f @o) 


X1 — Xo 


f [x0, x1] 


implies that when f’ exists, f[xo,x,] = f’(&) for some number & between xo and x;. The 
following theorem generalizes this result. 


Suppose that f € C”[a, b] and xo, x1, ...,X, are distinct numbers in [a, b]. Then a number & 
exists in (a, b) with 


_ £8) 


flto.%1, 4a) = — 
Nn. 


Proof Let 
g(x) = ff) — Pa). 


Since f (x;) = P,(x;) for eachi = 0,1,...,7, the function g has n+ | distinct zeros in [a, b]. 
Generalized Rolle’s Theorem 1.10 implies that a number & in (a, b) exists with g” (€) = 0, 
so 


O= FOE) — Py). 
Since P,,(x) is a polynomial of degree n whose leading coefficient is f[x0,x1,..-5Xn], 
PO y= nlf Ro iseiua Xe: 
for all values of x. As a consequence, 


(n) 
Ff lxo-X1.--- Xn] = f = 
Nn. ae 
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Newton’s divided-difference formula can be expressed in a simplified form when the 
nodes are arranged consecutively with equal spacing. In this case, we introduce the notation 
h = xi41 — xj, for each i = 0,1,...,2 — 1 and let x = x9 + sh. Then the difference x — x; 
is x — x; = (s — Dh. So Eq. (3.10) becomes 

Py (x) = Pn(ao + sh) = flxo] + shflxo.x1] + 968 — Dh? fL00,41,¥2] 
spate ss — Tso (smb In fl tos Aiy 0% 4h 


= flxol + > s(s— 1-6 —k + Da fox... ad: 
k=1 


Using binomial-coefficient notation, 


Ss s(s—1)---(s—k+1) 
ae k! , 


we can express P,,(x) compactly as 
=, {8 
Py(x) = Pr(xo + sh) = f [xo] + (j) A! Flaine (3.11) 
1 


k= 


Forward Differences 


The Newton forward-difference formula, is constructed by making use of the forward 
difference notation A introduced in Aitken’s A? method. With this notation, 


f@)— f@%) _ 1 


1 
fx, x1] = = —(f@1) — f@o)) = ~-Af Go) 
X| — Xo h h 
1[A —A 1 
flx0, 01,92] = et) h 4 = aya fx), 
and, in general, 
1 
flxox1,---4) = GA" FQ). 


Since f[xo] = f (xo), Eq. (3.11) has the following form. 


Newton Forward-Difference Formula 
Pix) = fo) + > (;) At fo) (3.12) 
k=1 


Backward Differences 


If the interpolating nodes are reordered from last to first as x,,X,_1,...,Xo, We can write 
the interpolatory formula as 


P,(x) = F[%n] 0 SF [Xn Xn—1 1% ~ Xn) oF fF UXn Xn—1-Xn—2 1 = Xn) (x ~~ Xn—-1) 
t-++ fltn,..-,X0)( — Xn) — M-1) +++ H— 1). 
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Interpolation and Polynomial Approximation 


If, in addition, the nodes are equally spaced with x = x, + sh and x = x; + (s+n—i)h, 
then 


Py(x) = Pan + sh) 
= fl%n] + sh fxn, %n—1] + 8(s + DK fin Xn—1,%n-2] +++ 
+ s(s+1)---(stn— Dh" flxy,..., x0]. 


This is used to derive a commonly applied formula known as the Newton backward- 
difference formula. To discuss this formula, we need the following definition. 


Given the sequence {p,}°° 9, define the backward difference Vp, (read nabla p,) by 
VPn=Pn—Pn-1, forn> 1. 
Higher powers are defined recursively by 


V'Pn = V(VE'pn), for k > 2. A 
Definition 3.7 implies that 


1 1 
Flin An) = FV SCn)> fn Xn—1,An-2] = say fen 


and, in general, 


1 


ae fn). 


FS UXn. Xn-1; see Xn-k] _ 


Consequently, 


S(s+1)---(s+n— 
Pi ) _ 


P,,(x) = fla] So sV f Xn) ai V? f Xn) Se 7" F(x). 


s(s + 1) 
2 


If we extend the binomial coefficient notation to include all real values of s by letting 


—s —s(—s— 1)---(-s—k+1) 
( )- kl kl 


k 


then 
P,(x) = fltal+(—D! ({‘)v ron (Fv? Fe + +-0"(7") V" f Gn). 


This gives the following result. 


Newton Backward-Difference Formula 


P(x) = fl) + yeo(;') V* f Gn) (3.13) 
k=1 
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Illustration The divided-difference Table 3.12 corresponds to the data in Example 1. 


Table 3.12 First divided Second divided Third divided Fourth divided 
differences differences differences differences 
1.0 0.7651977 
—0.4837057 
1.3 0.6200860 —0.1087339 
—0.5489460 0.0658784 
1.6 0.4554022 —0.0494433 0.0018251 
—0.5786120 0.0680685_ 
1.9 0.2818186 0.0118183 
=0.5715210 


2.2 01103623 


Only one interpolating polynomial of degree at most 4 uses these five data points, but we 
will organize the data points to obtain the best interpolation approximations of degrees 1, 
2, and 3. This will give us a sense of accuracy of the fourth-degree approximation for the 
given value of x. 

If an approximation to f(1.1) is required, the reasonable choice for the nodes would 
be x9 = 1.0, x1} = 1.3, 4% = 1.6, x3 = 1.9, and x4 = 2.2 since this choice makes the 
earliest possible use of the data points closest to x = 1.1, and also makes use of the fourth 
divided difference. This implies that h = 0.3 and s = i, so the Newton forward divided- 
difference formula is used with the divided differences that have a solid underline (__) in 
Table 3.12: 


P4(1.1) = P4(1.0 + 5(03)) 


1 1 2 
= 0.7651977 + 7 ae eae + 3 (-5) (0.3)? (—0.1087339) 


;( =) ( >) 0.3)°(0.0658784 
+3 ~3)\73 (0.3)° (0. ) 


if 2/7 S\/ 3 F 
+3( =) ( =) ( 5) 3) (0.0018251) 


= 0.7196460. 


To approximate a value when x is close to the end of the tabulated values, say, x = 2.0, we 
would again like to make the earliest use of the data points closest to x. This requires using 
the Newton backward divided-difference formula with s = -3 and the divided differences 
in Table 3.12 that have a wavy underline (___). Notice that the fourth divided difference 


is used in both formulas. 


P4(2.0) = P4 (22 - 5003) 


= 0.1103623 — 5 (0.3)(—0.5715210) 2 ; (5) (0:3)(0.0118183) 
: (=) (=) (0.3)3(0.0680685) — = (;) (+) (2) mene 
—3 (3) (3) oar =(4) (4) (2) oso. 


= 0.2238754. 
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Centered Differences 


The Newton forward- and backward-difference formulas are not appropriate for approximat- 
ing f (x) when x lies near the center of the table because neither will permit the highest-order 
difference to have xp close to x. A number of divided-difference formulas are available for 
this case, each of which has situations when it can be used to maximum advantage. These 
methods are known as centered-difference formulas. We will consider only one centered- 
difference formula, Stirling’s method. 

For the centered-difference formulas, we choose x9 near the point being approximated 
and label the nodes directly below xo as x1,x2,... and those directly above as x_1,x_2,.... 
With this convention, Stirling’s formula is given by 


h 
Pp(x) = Pamsi(x) = flxo] + 5 (fb, x0] + flxo, mi) +57h’ flx-1,%0,.01] (3.14) 


s(s* — 1)h3 
+ 3 F221 %0, 1] + f[x-1,X0,*1,%2]) 
James Stirling (1692-1770) 1,2 > > 525 Dyn 
published this and numerous peeeb = DO =A) (9 = Gea TP Fass «a nl 
other formulas in Methodus 5(s2 — [js (s? _ m?2)h2nt! 
Differentialis in 1720. + (flt-m-15--->%m] + flt—m.---sX%m41)), 


Techniques for accelerating the 2 
convergence of various series are if nm = 2m + 1 is odd. If n = 2m is even, we use the same formula but delete the last line. 


included in this work. The entries used for this formula are underlined in Table 3.13. 
Table 3.13 First divided  Seconddivided Third divided Fourth divided 
x F(x) differences differences differences differences 
X_2 SF [x_2] 
Sf [x-2, x1] 
x1 S{x-1] J [x-2,x-1, X0] 
fLx-1, x0] S [x~2, X-1, Xo, X1] 
x0 flxol flx-1,X0, 1] f[X-2,X-1, X05%1,%2] 
f lx, x1] S [x-1,X0,%1, x2] 
x1 fix] SF [X0, 41,2] 
fix, x2] 
X2 fix] 


Example 2 Consider the table of data given in the previous examples. Use Stirling’s formula to approx- 
imate f (1.5) with xo = 1.6. 


Solution To apply Stirling’s formula we use the underlined entries in the difference 
Table 3.14. 


Table 3.14 First divided | Seconddivided © Thirddivided Fourth divided 
x FQ) differences differences differences differences 


1.0 0.7651977 


—0.4837057 
1.3 0.6200860 —0.1087339 
—0.5489460 0.0658784 
1.6 0.4554022 —0.0494433 0.0018251 
—0.5786120 0.0680685 
9 0.2818186 0.0118183 
—0.5715210 


2.2 0.1103623 
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The formula, with h = 0.3, x) = 1.6, and s = — i becomes 
1 0.3 
= 0.4554022 + -3 oa ((—0.5489460) + (—0.5786120)) 


2 
+ (-;) (0.3)? (—0.0494433) 


2 
+ ; ( :) (( 7) ) (0.3)? (0.0658784 + 0.0680685) 


2 2, 
. ( :) (( ;) ) (0.3)*(0.0018251) = 0.5118200. o 


Most texts on numerical analysis written before the wide-spread use of computers have 
extensive treatments of divided-difference methods. If a more comprehensive treatment of 
this subject is needed, the book by Hildebrand [Hild] is a particularly good reference. 


EXERCISE SET 33 


1. 


Use Eq. (3.10) or Algorithm 3.2 to construct interpolating polynomials of degree one, two, and three 

for the following data. Approximate the specified value using each of the polynomials. 

a. (8.4) if f(8.1) = 16.94410, f(8.3) = 17.56492, f (8.6) = 18.50515, f(8.7) = 18.82091 

b. (0.9) if f(0.6) = —0.17694460, f(0.7) = 0.01375227, f(0.8) = 0.22363362, f(1.0) = 
0.65809197 

Use Eq. (3.10) or Algorithm 3.2 to construct interpolating polynomials of degree one, two, and three 

for the following data. Approximate the specified value using each of the polynomials. 

a. (0.43) if f(0) = 1, f(0.25) = 1.64872, f (0.5) = 2.71828, f (0.75) = 4.48169 

b. =f (0) if f(—0.5) = 1.93750, f(—0.25) = 1.33203, f (0.25) = 0.800781, f (0.5) = 0.687500 

Use Newton the forward-difference formula to construct interpolating polynomials of degree one, 

two, and three for the following data. Approximate the specified value using each of the polynomials. 

a f (—+) if f(—0.75) = —0.07181250, f(—0.5) = —0.02475000, f(—0.25) = 0.33493750, 
Ff (0) = 1.10100000 

b. =f (0.25) if f(0.1) = —0.62049958, f (0.2) = —0.28398668, f (0.3) = 0.00660095, f (0.4) = 
0.24842440 

Use the Newton forward-difference formula to construct interpolating polynomials of degree one, 

two, and three for the following data. Approximate the specified value using each of the polynomials. 

a. (0.43) if f(0) = 1, f(0.25) = 1.64872, f (0.5) = 2.71828, f (0.75) = 4.48169 

b. (0.18) if (0.1) = —0.29004986, f (0.2) = —0.56079734, f (0.3) = —0.81401972, f(0.4) = 
—1.0526302 

Use the Newton backward-difference formula to construct interpolating polynomials of degree one, 

two, and three for the following data. Approximate the specified value using each of the polynomials. 

a. f(—1/3) if f(—0.75) = —0.07181250, f(—0.5) = —0.02475000, f(—0.25) = 0.33493750, 
f (0) = 1.10100000 

b.  f (0.25) if f(0.1) = —0.62049958, f (0.2) = —0.28398668, f (0.3) = 0.00660095, f(0.4) = 
0.24842440 
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CHAPTER 3 


10. 


11. 


12. 


Interpolation and Polynomial Approximation 


Use the Newton backward-difference formula to construct interpolating polynomials of degree one, 
two, and three for the following data. Approximate the specified value using each of the polynomials. 


a. (0.43) if f(O) = 1, f(0.25) = 1.64872, f (0.5) = 2.71828, f (0.75) = 4.48169 


b. f (0.25) if f(—1) = 0.86199480, f(—0.5) = 0.95802009, f(0) = 1.0986123, f(0.5) = 
1.2943767 


a. Use Algorithm 3.2 to construct the interpolating polynomial of degree three for the unequally 
spaced points given in the following table: 


x|  f() 
—0.1 | 5.30000 
0.0 | 2.00000 
0.2 | 3.19000 
0.3 | 1.00000 


b. Add f (0.35) = 0.97260 to the table, and construct the interpolating polynomial of degree four. 


Use Algorithm 3.2 to construct the interpolating polynomial of degree four for the unequally 
spaced points given in the following table: 


x f(x) 

0.0 | —6.00000 

0.1 | —5.89483 

0.3 | —5.65014 

0.6 | —5.17788 

1.0 | —4.28172 
b. Add f(1.1) = —3.99583 to the table, and construct the interpolating polynomial of degree five. 
a. Approximate f (0.05) using the following data and the Newton forward-difference formula: 


x | 0.0 | 0.2 | 0.4 | 0.6 | 0.8 


FQ) | 1.00000 | 1.22140 | 1.49182 | 1.82212 | 2.22554 


b. Use the Newton backward-difference formula to approximate f (0.65). 
c. Use Stirling’s formula to approximate f (0.43). 
Show that the polynomial interpolating the following data has degree 3. 
x |-2|-1| 0| 1] 2] 3 
fa | 1] 4] 1] 16] 13 | -4 


a. Show that the cubic polynomials 
P(x) =3-2e4+1)4+004+)D@M+04+DQ00-1) 
and 
Q(x) = -14+4@4+2)-3@4+20+4+D)4+ @+2)@4+ DO) 
both interpolate the data 
x |-2|-1]0] 1|2 
fy |-1| 3]1]-1]3 


b. Why does part (a) not violate the uniqueness property of interpolating polynomials? 


A fourth-degree polynomial P(x) satisfies A*P(0) = 24, A?P(0) = 6, and A?P(0) = 0, where 
AP(x) = P(x + 1) — P(x). Compute A?P(10). 
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17. 


18. 


19. 


20. 
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The following data are given for a polynomial P(x) of unknown degree. 
x | 0 | 1 | 2 
Pox) | 2] -1 | 4 


Determine the coefficient of x? in P(x) if all third-order forward differences are 1. 


The following data are given for a polynomial P(x) of unknown degree. 


x [9] 4 | 2 | 3 


Pox) | 4] 9] 15 | 18 
Determine the coefficient of x? in P(x) if all fourth-order forward differences are 1. 
The Newton forward-difference formula is used to approximate f (0.3) given the following data. 
x 0.0); 02} 04] 06 
f() | 15.0 | 21.0 | 30.0 | 51.0 


Suppose it is discovered that f (0.4) was understated by 10 and f (0.6) was overstated by 5. By what 
amount should the approximation to f (0.3) be changed? 


For a function f, the Newton divided-difference formula gives the interpolating polynomial 
16 
P3(x) = 14+ 4x + 4x(x — 0.25) + 3 — 0.25) (x — 0.5), 


on the nodes xo = 0, x1 = 0.25, x. = 0.5 and x3 = 0.75. Find f (0.75). 


For a function f, the forward-divided differences are given by 


xX = 0.0 Fx] 
fl, %1] 

x, =0.4 f[xi] f[x0.%1,%2] = 2 
fl%1,%2] = 10 

Xy = 0.7 flo] =6 


Determine the missing entries in the table. 


a. The introduction to this chapter included a table listing the population of the United States from 
1950 to 2000. Use appropriate divided differences to approximate the population in the years 
1940, 1975, and 2020. 


b. The population in 1940 was approximately 132,165,000. How accurate do you think your 1975 
and 2020 figures are? 


Given 
Pix) = fl%o] + f[%0, x1] — x0) + a2(& — x0) (x — x1) 
+ a3(X — Xo)(% — X1)(X — x2) +> 
+ Ay (X = Xq) (KH =X) ++ = Mp1) 


use P,,(x2) to show that ay = f[%, x1, x2]. 
Show that 


(n+1) 
fo. X15 06 Xn xX] = — 


> 


for some & (x). [Hint: From Eq. (3.3), 


f°" EG) 


f(x) = Pa(x) + at D! 


(X — Xo) +++ @ — Xn). 
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Considering the interpolation polynomial of degree n + 1 on x9,.x),...,X,,x, we have 


f (x) = Pryi(x) = Pix) F SF [X0,%1,- $8 Xn X] (x — Xo) at. (x —Xn).] 


21. Let ip, i),...,i, be a rearrangement of the integers 0,1,...,n. Show that f[xi,, xi, .... %i,] = 
FLX, X15 +++» Xn]. [Hint: Consider the leading coefficient of the nth Lagrange polynomial on the 
data {x0,%1,---.Xn} = {Xig Xiys.- «5 Xi, bd 


| Sa 3.4 Hermite Interpolation 


The Latin word osculum, literally | Osculating polynomials generalize both the Taylor polynomials and the Lagrange polyno- 
a “small mouth” or “kiss”, when — mials. Suppose that we are given n+ | distinct numbers x9, x1,...,X, in [a, b] and nonneg- 
applied to a curve indicates that it ative integers mo,™m,...,M,, and m = max{mo,m,...,m,}. The osculating polynomial 
Just touches and has the same approximating a function f € C”[a,b] at x;, for each i = 0,...,n, is the polynomial of 
shape, Hermite interpolation has Jeast degree that has the same values as the function f and all its derivatives of order less 


sieidianina dads than or equal to m; at each x;. The degree of this osculating polynomial is at most 


matches a given curve, and its 

derivative forces the interpolating n 

curve to “kiss” the given curve. M= m+n 
i=0 


because the number of conditions to be satisfied is )~"_9 m; + (n + 1), and a polynomial of 
degree M has M + | coefficients that can be used to satisfy these conditions. 


Definition 3.8 Let x9,x1,...,X, be n + 1 distinct numbers in [a,b] and for i = 0,1,...,n let m; be a 
nonnegative integer. Suppose that f € C” [a,b], where m = maxo<j<n Mj. 


The osculating polynomial approximatin is the polynomial P(x) of least degree 
Charles Hermite (1822-1901) & poly PP 8 f ee ( ) g 


eae : such that 
made significant mathematical 
discoveries throughout his life in d* P(x;) d* f (xi) . 
areas such as complex analysis ak — a foreachi=0,1,...,2 and k=0O,1,...,mj. | 


and number theory, particularly 


ivelanig Hie Heony OF eae. Note that when n = 0, the osculating polynomial approximating f is the moth Taylor 


polynomial for f at x9. When m; = 0 for each i, the osculating polynomial is the nth 
Lagrange polynomial interpolating f on xo,x1,...,Xn- 


He is perhaps best known for 
proving in 1873 that e is 
transcendental, that is, it is not 
the solution to any algebraic 
equation having integer Hermite Polynomials 

coefficients. This lead in 1882 to 

Lindemann’s proof that z is also | The case when m; = 1, foreachi = 0,1,...,”, gives the Hermite polynomials. For a given 


transcendental, which function f, these polynomials agree with f at x9,x,,...,X,. In addition, since their first 
demonstrated that itis impossible derivatives agree with those of f, they have the same “shape” as the function at (x;, f (%;)) in 
to use the standard geometry the sense that the tangent lines to the polynomial and the function agree. We will restrict our 
tools of Euclid to construct a study of osculating polynomials to this situation and consider first a theorem that describes 


square that has the same area as @_—_ yrecisely the form of the Hermite polynomials. 


unit circle. 


Theorem 3.9 If fec 'Ta,b] and xo,...,X, € [a,b] are distinct, the unique polynomial of least degree 
agreeing with f and f’ at xo,...,X» is the Hermite polynomial of degree at most 2n + 1 
given by 


Hon silx) = >> f Anji) + >> fi) Ani, 


j=0 j=0 
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Hermite gave a description of a 
general osculatory polynomial in 
a letter to Carl W. Borchardt in 
1878, to whom he regularly sent 
his new results. His 
demonstration is an interesting 
application of the use of complex 
integration techniques to solve a 
real-valued problem. 
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where, for L,, ;(x) denoting the jth Lagrange coefficient polynomial of degree n, we have 
Hy jx) = [1 — 200 — Ly, (Lg) and Ay, j(x) = (x — x) LF (2). 
Moreover, if f € C?”*?[a, b], then 


(x — x0)” 2 (X- Xn)" a 
f (0) = Hons) + ao) faaiaiete3) 


for some (generally unknown) & (x) in the interval (a, b). a 


Proof First recall that 


0, ifi Fs, 
Ln, ji) = Wes * J 
1, ifi=j. 
Hence when i ¥ j, 
Hy, j(%;) =0 and = Ay, j(xi) = 0, 


whereas, for each i, 


Hae) =U —2@;—xpL @)]-1=1 and Ay 4(xj) = i — x) - 2 = 0. 
As a consequence 
Horii) = D> fj) 0+ foi): 1+ >> fj) -0= fas, 

j=0 j=0 

Ji 
SO Aon41 agrees with f at xo,X1,...,Xp. 

To show the agreement of H3,,,, with f’ at the nodes, first note that L,, ;(x) is a factor 
of Hi, |), so Ai, ; i) = 0 when i 4 j. In addition, when i = j we have L,,;(x;) = 1, so 
Hy (xi) = —2L4, (xj) - Li) + 1 — 20% — x), 0) 2Ln i LY, | 04) 
= =21,. 03) +20, A) = 0. 


Hence, H,, |i) = 0 for all i and j. 


Finally, 
HY, (xj) = L} ji) + Gi — 4) Dn OLY, 1) 
— n,j (Xi) Ln, j (Xi) + 2x; _ x)L,, ;@i)], 


so ie (x;) = Oif i 4 j and A (x) = |. Combining these facts, we have 


Hy10%) = D> fj) OF D> f'@)-0+ f'@)-1= fi). 
j=0 j=0 
Ji 
Therefore, H2,+; agrees with f and Hj,,,, with f’ at xo,x1,...,Xn- 
The uniqueness of this polynomial and the error formula are considered in 
Exercise 11. = = o8 
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Example 1 Use the Hermite polynomial that agrees with the data listed in Table 3.15 to find an approx- 
imation of f (1.5). 


Table 3.15 , * fr) fC) 
0 1.3 0.6200860 —0.5220232 
1 1.6 0.4554022 —0.5698959 
2 1.9 0.2818186 —0.5811571 


Solution We first compute the Lagrange polynomials and their derivatives. This gives 


@—xu)e—m) 50, 175 152 100-175 
L = — 7 L = —_Jj=— = 5 
1 eae oe oto 200) = EG 
(v—x)(w@—2) 100, 320.247 —~200 320 
L = = 7 L _— ees = 
MOS Ee ae a 2 = gg 
and 
(@w—x)(w@—m) 50, 145 104 100-145 
Loa = = i (ee 
oo oleae) oe 38 eS 220) = "x — —g 


The polynomials H ;(x) and Ab j(X) are then 


505 ATS ISN" 
x x+ 

9 9 9 
505 175. 152\7 

x x+ , 

9 9 9 
—100 , 320. 247\* 

x xX > 
9 9 9 


Ag9(x) = [1 — 2 — 1.3)¢ sui( 


= (10x 12)( 


Agi(x) = 1- ( 


50 145 104? 
Hp>(x) = 10(2 (Fe tg ) , 


P 50 175 152? 
Ano(x) = (x 13)( x x+ ) 


9 9 9 

b.s61el 247\* 

211%) = : 9 x 9 x 9 ; 
and 

4 50 145 104)? 

Ay9(x) = (x — 1.9) ( =x? x+ ) . 

9 9 

Finally 


Hs(x) = 0.6200860H>9(x) + 0.4554022H> ; (x) + 0.2818186H>2(x) 
— 0.5220232A29(x) — 0.5698959A>,; (x) — 0.5811571 Az 2(x) 
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and 
H5(1.5) = 0.6200860 : + 0.4554022 = + 0.2818186 2 
pene aed a7 81 81 
0.5220232 (4 0.569959 (~~) — ossiis7i ( 
405 ; 405 ; 405 
= 0.5118277, 
a result that is accurate to the places listed. a 


Although Theorem 3.9 provides a complete description of the Hermite polynomials, it 
is clear from Example | that the need to determine and evaluate the Lagrange polynomials 
and their derivatives makes the procedure tedious even for small values of n. 


Hermite Polynomials Using Divided Differences 


There is an alternative method for generating Hermite approximations that has as its basis 
the Newton interpolatory divided-difference formula (3.10) at xo,x),...,X,, that is, 


Py (x) = flxol + D> flxo.x1,-...xx](e — x0) +++ — xe-1). 


k=1 


The alternative method uses the connection between the nth divided difference and the nth 
derivative of f, as outlined in Theorem 3.6 in Section 3.3. 

Suppose that the distinct numbers x9,x;,...,x, are given together with the values of 
f and f’ at these numbers. Define a new sequence Zo, Z1,..-, Z2n+1 by 


Zoi = 22141 = X;, foreachi=0O,1,...,n, 


and construct the divided difference table in the form of Table 3.9 that uses Zo, Z1, . . ., Zan41- 

Since Zo; = Z2;+1 = x; foreachi, we cannot define f [Z2;, Z2;+1] by the divided difference 
formula. However, if we assume, based on Theorem 3.6, that the reasonable substitution in 
this situation is f[z;, 22:41] = f’ (zi) = f' (x), we can use the entries 


f (0), f(x), sey f Ga) 


in place of the undefined first divided differences 


f lz, 21], flz2,23],.--5 flZons Zan41]- 


The remaining divided differences are produced as usual, and the appropriate divided differ- 
ences are employed in Newton’s interpolatory divided-difference formula. Table 3.16 shows 
the entries that are used for the first three divided-difference columns when determining 
the Hermite polynomial H5(x) for xo, x1, and x2. The remaining entries are generated in the 
same manner as in Table 3.9. The Hermite polynomial is then given by 


2n+1 


Hong) = flzol + > flzo.-. 524M — zo) = 21) + = z-1). 
k=1 


A proof of this fact can be found in [Pow], p. 56. 
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Table 3.16 


Example 2 


Interpolation and Polynomial Approximation 


First divided 
differences 


Second divided 
differences 


z f (2) 


f [zo] = fo) 
fl2, 2] = f'@o) 
flzi, 22] — flzo, Zi] 


Z1 = Xo fla] = fo) Sf l2, 21,22] = 
flel— fle] _ 
f lz, 22] = ———_ 
ain flze,z31 — flzi.z2] 
2 = xy Slz2] = fr) flz1, 22,23] = —— 
fz, 23] = f’ (1) 
23 =X flz3] = fn) SF [Z2, 23, 24] = Ass Tes! 
4 — 22 
fi e= flza] — flzs] 
ial [za, 25] — flzs,z4] 
Z4 =X flea] = fa) Fes, 24,25] = Peete 
5 — %3 
Sle, %5] = f'@) 
25 = X2 flzs] = f@2) 


Use the data given in Example | and the divided difference method to determine the Hermite 
polynomial approximation at x = 1.5. 


Solution The underlined entries in the first three columns of Table 3.17 are the data given 
in Example 1. The remaining entries in this table are generated by the standard divided- 
difference formula (3.9). 

For example, for the second entry in the third column we use the second 1.3 entry in 
the second column and the first 1.6 entry in that column to obtain 


4554022 — 0.62 
ae 6 os ze = —0.5489460. 


For the first entry in the fourth column we use the first 1.3 entry in the third column and the 
first 1.6 entry in that column to obtain 


—0.5489460 — (—0.5220232) _ 
1.6—13 ~ 


0.0897427. 


The value of the Hermite polynomial at 1.5 is 


H5(1.5) = f[1.3] + f’(1.3)(.5 — 1.3) + f[1.3, 1.3, 1.6](1.5 — 1.3)? 
+ f[1.3, 1.3, 1.6, 1.6](1.5 — 1.3)?(1.5 — 1.6) 
+ f[1.3, 1.3, 1.6, 1.6, 1.9](1.5 — 1.3)?(1.5 — 1.6)? 
+ f[1.3, 1.3, 1.6, 1.6, 1.9, 1.9](1.5 — 1.3)°(1.5 — 1.6)?(1.5 — 1.9) 
= 0.6200860 + (—0.5220232)(0.2) + (—0.0897427) (0.2)? 
+ 0.0663657(0.2)?(—0.1) + 0.0026663 (0.2)? (—0.1)? 
+ (—0.0027738)(0.2)?(—0.1)?(—0.4) 
= 0.5118277. 
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Table 3.17 1.3 0.6200860 


—0.5220232 
1.3. 0.6200860 —0.0897427 

_ —0.5489460 0.0663657 

1.6  0.4554022 —0.0698330 0.0026663 

—_ —0.5698959 0.0679655 —0.0027738 
1.6  0.4554022 —0.0290537 0.0010020 

_ —0.5786120 0.0685667 

1.9 0.2818186 —0.0084837 

_ —0.5811571 


19 0.2818186 


The technique used in Algorithm 3.3 can be extended for use in determining other 
osculating polynomials. A concise discussion of the procedures can be found in [Pow], 
pp. 53-57. 


Hermite Interpolation 


To obtain the coefficients of the Hermite interpolating polynomial H(x) on the (n + 1) 
distinct numbers xo, ...,X, for the function f: 


INPUT numbers xo, x1,...,Xn3 values f(xo),..., f(%,) and f’(x9), ..., f’ (On). 
OUTPUT the numbers Qo, Q11,---, Q2n+1.2n41 Where 
H(x) = Qoo + O11 (& — x0) + Qo2(x — x0)? + O33(x — x0)? — x1) 
+O44(% — x0) (x — x1)? ++ 
+Oonstan41 & — x0)? (X= 1)? +++ = X12 = Xn). 
Step 1 Fori=0,1,...,ndo Steps 2 and 3. 
Step2 Set 2; = x;; 
22i41 = Xi 
Qoi90 = f (xi); 
Qri+1,0 = f (Xi); 
Qoi41,1 _ f' (xi). 


Step 3 Ifi £0 then set 


bui= Ori.9 — Qai-1.0 
£2i — £2i-1 
Step 4 Fori=2,3,...,2n+1 
. . Qj, j-1 — Qi-1, j-1 
for j = 2,3,...,i set OQ; ; = —————"_. 
Zi — Zi-j 
Step 5 OUTPUT (Qo0, Qi1,---, Qan41.2n+1)5 
STOP ‘ 


The NumericalAnalysis package in Maple can be used to construct the Hermite coef- 
ficients. We first need to load the package and to define the data that is being used, in this 
case, x;, f (x;), and f’(x;) fori = 0,1,...,n. This is done by presenting the data in the form 
[x;, f (x;), f’(;)]. For example, the data for Example 2 is entered as 


xy := [[1.3, 0.6200860, —0.5220232], [1.6, 0.4554022, —0.5698959], 
[1.9, 0.2818186, —0.5811571]] 
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Then the command 
h5 := PolynomialInterpolation(xy, method = hermite, independentvar = ‘x') 


produces an array whose nonzero entries correspond to the values in Table 3.17. The Hermite 
interpolating polynomial is created with the command 


Interpolant(h5)) 


This gives the polynomial in (almost) Newton forward-difference form 


1.29871616— 0.5220232x — 0.08974266667(x— 1.3)? + 0.06636555557(x—1.3)°(x — 1.6) 
+ 0.002666666633(x — 1.3)*(x — 1.6)? — 0.002774691277(x — 1.3)?(x — 1.6)?(x — 1.9) 


If a standard representation of the polynomial is needed, it is found with 
expand(Interpolant(h5)) 


giving the Maple response 


1.001944063 — 0.0082292208x — 0.2352161732x* — 0.01455607812x°? 
+ 0.02403178946x* — 0.002774691277x> 


EXERCISE SET 3.4 


1. Use Theorem 3.9 or Algorithm 3.3 to construct an approximating polynomial for the following data. 


a ox | f@® | f'@® bo x f@ | f'@) 
8.3 | 17.56492 | 3.116256 0.8 | 0.22363362 | 2.1691753 
8.6 | 18.50515 | 3.151762 1.0 | 0.65809197 | 2.0466965 

& f@ | f'@ d. x f (x) | f'@) 
—0.5 —0.0247500 | 0.7510000 0.1 | —0.62049958 | 3.58502082 
—0.25 0.3349375 | 2.1890000 0.2 | —0.28398668 | 3.14033271 
0 1.1010000 | 4.0020000 0.3 0.00660095 | 2.66668043 

0.4 0.24842440 | 2.16529366 


2. Use Theorem 3.9 or Algorithm 3.3 to construct an approximating polynomial for the following data. 


a x | f@ | Fe bx | f@) f'@) 
0 1.00000 | 2.00000 —0.25 | 1.33203 0.437500 
0.5 | 2.71828 | 5.43656 0.25 | 0.800781 | —0.625000 
ex fey | F'@) ad ox | f@ | fe 
0.1 | —0.29004996 | —2.8019975 —l 0.86199480 | 0.15536240 
0.2 | —0.56079734 | —2.6159201 —0.5 | 0.95802009 | 0.23269654 
0.3 | —0.81401972 | —2.9734038 0 1.0986123 0.33333333 


0.5 | 1.2943767 0.45186776 
3. The data in Exercise 1 were generated using the following functions. Use the polynomials constructed 
in Exercise 1 for the given value of x to approximate f(x), and calculate the absolute error. 
a. f(x) =xInx; approximate f (8.4). 
b. f(x) = sin(e* — 2); approximate f (0.9). 
ce f(x) =x +4.001x? + 4.002x + 1.101; approximate f(—1/3). 
d. f(x) =xcosx — 2x? +3x—1; approximate f (0.25). 
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4. The data in Exercise 2 were generated using the following functions. Use the polynomials constructed 
in Exercise 2 for the given value of x to approximate f(x), and calculate the absolute error. 


a. f(x) =e*; approximate f (0.43). 

b.  f(@) =xt-—x3+2?—x4+1; approximate f(0). 
c. f(x) =x? cosx—3x; approximate f (0.18). 

d. f(x) =In(e’ +2); approximate f (0.25). 
a 


Use the following values and five-digit rounding arithmetic to construct the Hermite interpolating 
polynomial to approximate sin 0.34. 


x | sin x D, sinx = cos x 
0.30 | 0.29552 0.95534 
0.32 | 0.31457 0.94924 
0.35 | 0.34290 0.93937 


b. Determine an error bound for the approximation in part (a), and compare it to the actual error. 
ce. Add sin 0.33 = 0.32404 and cos 0.33 = 0.94604 to the data, and redo the calculations. 
6. Let f(x) = 3xe* — e*. 
a. Approximate f (1.03) by the Hermite interpolating polynomial of degree at most three using 
xo = 1 and x; = 1.05. Compare the actual error to the error bound. 
b. Repeat (a) with the Hermite interpolating polynomial of degree at most five, using x9 = 1, 
x; = 1.05, and x. = 1.07. 
7. Use the error formula and Maple to find a bound for the errors in the approximations of f(x) in parts 
(a) and (c) of Exercise 3. 


8. Use the error formula and Maple to find a bound for the errors in the approximations of f(x) in parts 
(a) and (c) of Exercise 4. 


9. The following table lists data for the function described by f(x) = eo, Approximate f (1.25) by 
using H5(1.25) and H3(1.25), where Hs uses the nodes x9 = 1, x; = 2, and x. = 3; and H3 uses the 
nodes X) = 1 and x, = 1.5. Find error bounds for these approximations. 


% {@=e™ | PQS U2e 
Xo =%) =1 | 1105170918 | 0.2210341836 
m= 15 1.252322716 | 0.3756968148 
m=2 1491824698 |  0.5967298792 
= 3 2.459603111 |  1.475761867 


10. Acar traveling along a straight road is clocked at a number of points. The data from the observations 
are given in the following table, where the time is in seconds, the distance is in feet, and the speed is 
in feet per second. 


Time 0 3 5 8 13 


Distance O | 225 | 383 | 623 | 993 


Speed 75 77 80 74 72 


a. Use a Hermite polynomial to predict the position of the car and its speed when t = 10s. 


Use the derivative of the Hermite polynomial to determine whether the car ever exceeds a 
55 mi/h speed limit on the road. If so, what is the first time the car exceeds this speed? 


What is the predicted maximum speed for the car? 


11. a. Show that A>, (x) is the unique polynomial of least degree agreeing with f and f’ atx, ...,Xn- 
[Hint: Assume that P(x) is another such polynomial and consider D = H>,,; — P and D’ at 
XO» X15+-- Xn] 
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b. Derive the error term in Theorem 3.9. [Hint: Use the same method as in the Lagrange error 
derivation, Theorem 3.3, defining 


(t — Xo)? +++ (t= Xn)? 


gt) = f® — Ani (x — X9)2 +++ (x — X,)? 


Lf) — Aon+1(*)] 


and using the fact that g’(t) has (2n + 2) distinct zeros in [a, b].] 


12. Let 7% = Xo, Z1 = Xo, Z2 = X1, and z3 = x;. Form the following divided-difference table. 


z%=x flz] = fo) 
flo, 21] = f’@o) 
£1 = Xo fla] = fo) Sf l20, 21, 22] 
flz, za] Ff [Z0, 21, 22, 23] 
a2a=x flal= fa) flz, 2,23] 
flz2,23] = fx) 
3=xX1 fiz] = fr) 


Show that the cubic Hermite polynomial H3(x) can also be written as f[zo] + flzo,zi](« — x0) + 
Ff lzo, 21, Z21@ — x0)? + flZ0, 21, 225 23] (& — Xo)? (x — 4). 


a 3.5 Cubic Spline Interpolation’ 


The previous sections concerned the approximation of arbitrary functions on closed intervals 
using a single polynomial. However, high-degree polynomials can oscillate erratically, that 
is, a minor fluctuation over a small portion of the interval can induce large fluctuations 
over the entire range. We will see a good example of this in Figure 3.14 at the end of this 
section. 

An alternative approach is to divide the approximation interval into a collection of 
subintervals and construct a (generally) different approximating polynomial on each sub- 
interval. This is called piecewise-polynomial approximation. 


Piecewise-Polynomial Approximation 


The simplest piecewise-polynomial approximation is piecewise-linear interpolation, which 
consists of joining a set of data points 


{(x0, f G0), 1, FO1)),- +++ Ons fAn))I 


by a series of straight lines, as shown in Figure 3.7. 

A disadvantage of linear function approximation is that there is likely no differ- 
entiability at the endpoints of the subintervals, which, in a geometrical context, means 
that the interpolating function is not “smooth.” Often it is clear from physical condi- 
tions that smoothness is required, so the approximating function must be continuously 


differentiable. 
An alternative procedure is to use a piecewise polynomial of Hermite type. For example, 
if the values of f and of f’ are known at each of the points x) < x1 < +--+ < X,, a cubic 


Hermite polynomial can be used on each of the subintervals [%0, x1], [%1,%2],.-- 5 [Xn-1>Xn] 
to obtain a function that has a continuous derivative on the interval [xo, x, |. 


"The proofs of the theorems in this section rely on results in Chapter 6. 
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Isaac Jacob Schoenberg 
(1903-1990) developed his work 
on splines during World War II 
while on leave from the 
University of Pennsylvania to 
work at the Army’s Ballistic 
Research Laboratory in 
Aberdeen, Maryland. His original 
work involved numerical 
procedures for solving 
differential equations. The much 
broader application of splines to 
the areas of data fitting and 
computer-aided geometric design 
became evident with the 
widespread availability of 
computers in the 1960s. 


The root of the word “spline” is 
the same as that of splint. It was 
originally a small strip of wood 
that could be used to join two 
boards. Later the word was used 
to refer to a long flexible strip, 
generally of metal, that could be 
used to draw continuous smooth 
curves by forcing the strip to pass 
through specified points and 
tracing along the curve. 


To determine the appropriate Hermite cubic polynomial on a given interval is simply 
a matter of computing H3(x) for that interval. The Lagrange interpolating polynomials 
needed to determine H3 are of first degree, so this can be accomplished without great 
difficulty. However, to use Hermite piecewise polynomials for general interpolation, we 
need to know the derivative of the function being approximated, and this is frequently 
unavailable. 

The remainder of this section considers approximation using piecewise polynomials 
that require no specific derivative information, except perhaps at the endpoints of the interval 
on which the function is being approximated. 

The simplest type of differentiable piecewise-polynomial function on an entire interval 
[X0, Xn] is the function obtained by fitting one quadratic polynomial between each successive 
pair of nodes. This is done by constructing a quadratic on [Xo, x; ] agreeing with the function 
at Xo and x,, another quadratic on [x,,x2] agreeing with the function at x; and x2, and so 
on. A general quadratic polynomial has three arbitrary constants—the constant term, the 
coefficient of x, and the coefficient of x7—and only two conditions are required to fit the 
data at the endpoints of each subinterval. So flexibility exists that permits the quadratics to 
be chosen so that the interpolant has a continuous derivative on [xo, x, ]. The difficulty arises 
because we generally need to specify conditions about the derivative of the interpolant at 
the endpoints xo and x,. There is not a sufficient number of constants to ensure that the 
conditions will be satisfied. (See Exercise 26.) 


Cubic Splines 


The most common piecewise-polynomial approximation uses cubic polynomials between 
each successive pair of nodes and is called cubic spline interpolation. A general cubic 
polynomial involves four constants, so there is sufficient flexibility in the cubic spline pro- 
cedure to ensure that the interpolant is not only continuously differentiable on the interval, 
but also has a continuous second derivative. The construction of the cubic spline does not, 
however, assume that the derivatives of the interpolant agree with those of the function it is 
approximating, even at the nodes. (See Figure 3.8.) 
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Figure 3.8 


Sip) = FOj+1) = S41 ij+0) 


Sie) = SjiQjev 
SiQjev = Si Op.) 


Definition 3.10 Given a function f defined on [a,b] and a set of nodes a = x < x) < -:: < 
X, = b, a cubic spline interpolant S for f is a function that satisfies the following 
conditions: 


(a) S(x) is a cubic polynomial, denoted S$;(x), on the subinterval [x;, xj<1] for each 


A natural spline has no conditions j=0,1,...,n—-1; 

imposed for the direction at its . 

vadpoiiis, sathecaneiaien ie (b)  Sj(xj) = f Qj) and SjQj41) = f Qj41) for eachj = 0,1,...,n— 1; 
shape of a straight line after it (c) Six1 O41) = Sj (x41) for each j = 0,1,...,2 — 2; (Implied by (b).) 
passes through the interpolation i j . 

points nearest its endpoints. The (d) Sit O41) = S; (j41) for each j = 0,1,..., — 2; 

anid derives from the fact at (e) Sry (xj41) = Ss! (x41) for each j = 0,1,...,2 — 2; 

this is the natural shape a flexible : ee : : 

civemen ei (f) One of the following sets of boundary conditions is satisfied: 


through specified interpolation J 
eet Metical (Gi) S”(x%o) = 8S”) =0 (natural (or free) boundary); 


constraints. (See Figure 3.9.) (ii) =S’(xo) = f'(xo) «and S’(%,) = f’Qn) (clamped boundary). | 


Although cubic splines are defined with other boundary conditions, the conditions given 
in (f) are sufficient for our purposes. When the free boundary conditions occur, the spline is 
‘ called a natural spline, and its graph approximates the shape that a long flexible rod would 
Figure 3.9 assume if forced to go through the data points {(xo, f (xo)), (41, f(41)),-- ++ Ons f An))}- 

In general, clamped boundary conditions lead to more accurate approximations because 
they include more information about the function. However, for this type of boundary 
condition to hold, it is necessary to have either the values of the derivative at the endpoints 
or an accurate approximation to those values. 


Example 1 Construct a natural cubic spline that passes through the points (1, 2), (2,3), and (3, 5). 


Solution This spline consists of two cubics. The first for the interval [1,2], denoted 


So(x) = ay + bo(x — 1) + eg (x — 1)? +. doe — 1°, 
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Clamping a spline indicates that 
the ends of the flexible strip are 
fixed so that it is forced to take a 
specific direction at each of its 
endpoints. This is important, for 
example, when two spline 
functions should match at their 
endpoints. This is done 
mathematically by specifying the 
values of the derivative of the 
curve at the endpoints of the 
spline. 
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and the other for [2, 3], denoted 
Si(x) = ay + bie — 2) +1 (x — 2) + dy (x — 2)?. 


There are 8 constants to be determined, which requires 8 conditions. Four conditions come 
from the fact that the splines must agree with the data at the nodes. Hence 


2=fl)=a, 3= f(2) =a. + bo + co + do, and 


5= f3) =a,+b) +e, +d). 


3= f@Q)=a1, 


Two more come from the fact that $)(2) = S' (2) and Sg (2) = S{/(2). These are 


Si(2) = Si (2): bo t2co+3do=b, and = S%(2) = S"(2): 2p + 6dy = 2c 


The final two come from the natural boundary conditions: 


Sod) =0: 2c9=0 and S/(3)=0: 2c; +6d, = 0. 


Solving this system of equations gives the spline 


2+2@-1)+4@—-D%, forx € [1,2] 


3+ 3(~—2) + 30 — 2)? — £1 — 2)3, for x € [2,3] 


Sx) = 


Construction of a Cubic Spline 


As the preceding example demonstrates, a spline defined on an interval that is divided into n 
subintervals will require determining 4n constants. To construct the cubic spline interpolant 
for a given function f, the conditions in the definition are applied to the cubic polynomials 


S)(x) = a + bx — x) + oe — x)? + G(x — x), 


for each j = 0,1,...,n — 1. Since S;(4j) = a; = f (4), condition (c) can be applied to 
obtain 


Oy = Spor p41) = SpQqpa) = a + bya — 39) + Gyr — 4)? +. Oy — 9)’, 
for each j = 0,1,...,n —2. 
The terms xj; — x; are used repeatedly in this development, so it is convenient to 
introduce the simpler notation 
hy = X41 — Xs 
for each j = 0,1,...,2 — 1. If we also define a, = f(x,), then the equation 
Opn = aj + bjhy + cjh? + dh; (3.15) 


holds for each j = 0,1,...,2— 1. 
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In a similar manner, define b, = S’(x,) and observe that 
Si(x) = bj + 2cj(x — xj) + 3dj(x — xj) 
implies My (xj) = b;, for each j = 0,1,..., — 1. Applying condition (d) gives 
bj1 = bj + 2cjhj + 3djh?, (3.16) 
for each j = 0,1,...,2-—1. 
Another relationship between the coefficients of S; is obtained by defining c, = 
S”(x,)/2 and applying condition (e). Then, for each j = 0,1,...,2 — 1, 
Cie = Cj + 3djhj. (3.17) 


Solving for d; in Eq. (3.17) and substituting this value into Eqs. (3.15) and (3.16) gives, 
for each j = 0,1,..., — 1, the new equations 


h? 
jz1 = aj + Djhjy + 3 cj + cj41) (3.18) 
and 


The final relationship involving the coefficients is obtained by solving the appropriate 
equation in the form of equation (3.18), first for b;, 


bj = 


1 h; 
h (aj41 — a) 3 (2¢) + cj41), (3.20) 
a 


and then, with a reduction of the index, for bj. This gives 


1 ea 
bj-1 = (aj — aj-1) 3 (2cj_-1 + cj). 


Substituting these values into the equation derived from Eq. (3.19), with the index reduced 
by one, gives the linear system of equations 


3 
hj-1¢j-1 + 2(hj-1 a hj)c; + hjcj41 = a (aj41 qj) (qj aj-1)s (3.21) 


j hj-1 


for each j = 1,2,...,n—1. This system involves only the {c; Yo as unknowns. The values 


of {hj}"-5 and {aj}, are given, respectively, by the spacing of the nodes {x;}"_) and the 


values of f at the nodes. So once the values of {c; yr 9 are determined, it is a simple matter 
to find the remainder of the constants {bj}=9 from Eq. (3.20) and {a5 from Eq. (3.17). 


Then we can construct the cubic polynomials {5;(x) ae 

The major question that arises in connection with this construction is whether the values 
of {cj}i-0 can be found using the system of equations given in (3.21) and, if so, whether 
these values are unique. The following theorems indicate that this is the case when either of 
the boundary conditions given in part (f) of the definition are imposed. The proofs of these 
theorems require material from linear algebra, which is discussed in Chapter 6. 
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Theorem 3.11 
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Natural Splines 


If f is defined ata = x) < x; <-+-- <x, = b, then f has a unique natural spline interpolant 
S on the nodes xo, x1, ..., Xn; that is, a spline interpolant that satisfies the natural boundary 
conditions S” (a) = 0 and S”(b) = 0. a 


Proof The boundary conditions in this case imply that c, = S’”(x,)/2 = 0 and that 
0 = S"(x0) = 2co + 6do(xo — Xo), 


sO Co = 0. The two equations co = 0 and c, = O together with the equations in (3.21) 
produce a linear system described by the vector equation Ax = b, where A is the (n+ 1) x 
(n + 1) matrix 


1 0 Oe 0 
ho 2(ho + hi) ho : 
0.. hy., 2h th) b.., : 

, wee ee 0 
igen (lina + Pint) Pat 
iceticis Macoineial te aolne ee) 0 1 


and b and x are the vectors 


0 
2 (a2 — a1) — 2 (a — ao) c 
hy ho 0 
Cl 
b= : and x=]. 
3 3 : 
i (an —_ An—1) —_ Igy (an-1 _ An—2) Cn 
0 


The matrix A is strictly diagonally dominant, that is, in each row the magnitude of the 
diagonal entry exceeds the sum of the magnitudes of all the other entries in the row. A linear 
system with a matrix of this form will be shown by Theorem 6.21 in Section 6.6 to have a 
unique solution for co, C1,..-,Cn- _ 2 8 


The solution to the cubic spline problem with the boundary conditions S’(x)) = 
S” (X,) = 0 can be obtained by applying Algorithm 3.4. 


Natural Cubic Spline 


To construct the cubic spline interpolant S for the function f, defined at the numbers 
Xy <X] <-++ <X,, Satisfying S” (x9) = S”(x,) = 0: 
INPUT 7;X0,%1,---.%n3d0 = f%o),41 = f(%1),---54n = f On). 
OUTPUT dj, bj, Cj, dj for j => 0, 1, Seer y (aad 1. 
(Note: S(x) = S;(x) = aj + bi(x — xj) + oj (x — ay + d(x — xj)° for Xj SX S< X41.) 


Step 1 Fori=0,1,...,n—1 seth; = xj41 — x. 
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Example 2 


Interpolation and Polynomial Approximation 


Step 2 Fori=1,2,...,n—1 set 


3 3 
a; = pit _ dj) _ h (qj = di_1). 


i-1 
Step 3 Seth) =1; (Steps 3, 4, 5, and part of Step 6 solve a tridiagonal linear system 
using a method described in Algorithm 6.7.) 
Mo = 0; 
Zo = 0. 
Step 4 Fori=1,2,...,.n—1 
set J; = 2(%i41 — Xi-1) — Ai-1Mi-1; 
Mi = hi/li; 
Zi = (a — Wi-12zi-1)/li. 
Step 5 Seti, =1; 
Zn = V; 


Step 6 Forj=n-—1,n—2,...,0 
set cj = Z — MjCj413 
bj = (G41 — aj) /hy — hy(Cj41 + 2¢))/33 
dj = (cj+1 — ¢)/GBh)). 


Step 7 OUTPUT (a), bj, c,d; for j = 0,1,...,n— 1); 
STOP. . 


At the beginning of Chapter 3 we gave some Taylor polynomials to approximate the expo- 
nential f(x) = e*. Use the data points (0, 1), (1,e), (2, e”), and (3, e*) to form a natural 
spline S(x) that approximates f(x) = e*. 


Solution We have n = 3, ho = hy = hy = 1, ap = 1, a) = C, & = e’, and ay =e’. So the 
matrix A and the vectors b and x given in Theorem 3.11 have the forms 


1000 0 co 
_{1 4 1 °0 _ | 3 —2e+1) _ fe 
Ng: a a a PS bao egy tS 

0001 0 C3 


The vector-matrix equation Ax = b is equivalent to the system of equations 


co = 0,7 
co + 4c1 +¢2 = 3(e? — 2e +L), 
c1 + 4cn +03 = 3(e? — 2e* +), 


c3 = 0. 


This system has the solution cg = c3 = 0, and to 5 decimal places, 


1 1 
c= gee + 6e? —9e +4) © 0.75685, and @= zte — 9e* + 6e — 1) © 5.83007. 
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Solving for the remaining constants gives 
Ce ee 
0 = ho a; — ag 3 Cl Co 
1 
=(e-1)- ee + 6e* — 9e + 4) © 1.46600, 
ee) oa) 
1= hy a2 — aj 3 c2 Cl 
2 1 3 2 
= (e& —e)— 15 (¢ + 3e° — 12e+ 7) © 2.22285, 
ie GAs (Gee 
= ho a3 — a2 3 C3 (5) 


a a ee ee yx 
= (e —e*) 15 (Be 18e° + 12e — 2) © 8.80977, 


1 1 

do = sp (cr — ¢0) = Tee? + Ge* — 9e + 4) 0.25228, 
d= a ee 3e? + 3e — 1) © 1.69107 
aa C2 Ci = 3 e e x1. . 


and 


1 1 
db = ay —c)= ae + 9e? — 6e + 1) © — 1.94336. 


The natural cubic spine is described piecewise by 
1+ 1.46600x + 0.25228x?, for x € [0, 1], 


S(x) = } 2.71828 + 2.22285(x —1) + 0.75685(x — 1)? +1.69107(x —1)?, for x € [1,2], 
7.38906 + 8.80977(x —2) + 5.83007(x —2)* —1.94336(x —2)°, for x € [2,3]. 


The spline and its agreement with f(x) = e* are shown in Figure 3.10. a 


Figure 3.10 
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Illustration 


Interpolation and Polynomial Approximation 


The NumericalAnalysis package can be used to create a cubic spline in a manner similar 
to other constructions in this chapter. However, the CurveFitting Package in Maple can also 
be used, and since this has not been discussed previously we will use it to create the natural 
spline in Example 2. First we load the package with the command 


with(CurveFitting) 
and define the function being approximated with 
frHxr-e 


To create a spline we need to specify the nodes, variable, the degree, and the natural end- 
points. This is done with 


sn := t — Spline({[0., 1.0], (1.0, f(1.0)], [2.0, f(2.0)], [3.0, f (3.0) ]], t, degree = 3, 


endpoints = ‘natural’) 


Maple returns 


t > CurveFitting:-Spline({[0., 1.0], [1.0, f (1.0)], [2.0, f (2.0)], [3.0, f(3.0)]], ¢, 
degree = 3, endpoints = ’natural’) 


The form of the natural spline is seen with the command 

sn(t) 

which produces 
1. + 1.46599877 + 0.252284873 t< 1.0 
0.495432 + 2.222851 + 0.756853(t — 1.0)? + 1.691071(r — 1.0)? t<2.0 
—10.230483 + 8.809770r + 5.830067(t — 2.0)? — 1.943356(t — 2.0)? otherwise 


Once we have determined a spline approximation for a function we can use it to 
approximate other properties of the function. The next illustration involves the integral 
of the spline we found in the previous example. 


To approximate the integral of f(x) = e* on [0,3], which has the value 


3 
/ e* dx = e — 1 © 20.08553692 — 1 = 19.08553692, 
0 


we can piecewise integrate the spline that approximates f on this integral. This gives 


3 1 
/ S(x) = / 1 + 1.46600x + 0.25228x? dx 
0 0 


2 
fe / 2.71828 + 2.22285(x — 1) + 0.75685(x — 1)? + 1.69107(x — 1)? dx 
1 


3 
+ / 7.38906 + 8.80977(x — 2) + 5.83007(x — 2)” — 1.94336(x — 2)? dx. 
2 
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Integrating and collecting values from like powers gives 


4741 


3 x2 x 
/ S(x) = |x + 1.46600— + 0.25228 — 
F 5 4 |, 


(= 1)" 1) 
+ | 2.71828(¢— 1) + 2.22285 5 


a nk 
+ 0.75685 —_— 5 + 1.69107 -——— 
1 


+ 5.83007 — 1.94336 


(x—2) (x—2) (x—2)*7? 
+ | 7.38906(x—2) + 8.80977 
2 3 As 


1 
= (1 + 2.71828 + 7.38906) + 5 (1.46600 + 2.22285 + 8.80977) 


1 1 
fF 3 (0.75685 + 5.83007) + Z (0.25228 + 1.69107 — 1.94336) 
= 19.55229. 


Because the nodes are equally spaced in this example the integral approximation is 
simply 


3 1 1 1 
/ S(x) de = (ag +a; $a2)+5(bo-+bi +b2)+5(co-+e1 +02) + 5 dota +s). 3.22) 
0 


If we create the natural spline using Maple as described after Example 2, we can then 
use Maple’s integration command to find the value in the Illustration. Simply enter 


int(sn(t),t = 0.. 3) 
19.55228648 


Clamped Splines 


Example 3. In Example 1 we found a natural spline S that passes through the points (1,2), (2,3), 
and (3,5). Construct a clamped spline s through these points that has s’(1) = 2 and 
sB)=1. 


Solution Let 

S(X) = ay + bol — 1) + co(x — 1)? +do(x— ’, 
be the cubic on [1, 2] and the cubic on [2, 3] be 

si(x) =a, +(x — 2) + ey (x — 2)? +x — 2)°. 


Then most of the conditions to determine the 8 constants are the same as those in Example 
1. That is, 


2=fQ)=a, 3=f2)=a+b+aot+d, 3= fQ=a, and 


5= f3)=a,+b) +c, +d). 
si(2) = s(2): bo t2co+3dy=b, and —8%(2) = 572): 2cy + dy = 2c 
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However, the boundary conditions are now 
so) =2: bo =2 and s\G3)=1: by 4+2c)4+3d) = 1. 
Solving this system of equations gives the spline as 


— J242@-1)- 30-1? + 3@— D?, forx € [1,2] 


= V3 4 32) 42x — 2)? — 3D’, for x € [2,3] 


In the case of general clamped boundary conditions we have a result that is similar to 
the theorem for natural boundary conditions described in Theorem 3.11. 


Theorem 3.12 If f is defined at a = x9 < x; < +--+ < x, = b and differentiable at a and b, then f has a 
unique clamped spline interpolant S on the nodes xo, .x1,...,X,; that is, a spline interpolant 
that satisfies the clamped boundary conditions S’(a) = f’(a) and S’(b) = f'(b). a 


Proof Since f’(a) = S'(a) = S'(xo) = bo, Eq. (3.20) with j = 0 implies 
( 1 ho 
f (@ = —(@ — 4) — = co + ¢}). 
ho 3 
Consequently, 
3 / 
2hoco + hoc, = 7! — do) — 3 f'(@). 
0 


Similarly, 
f'(b) = bn = Dn-1 ot In (Cn-1 os Cn)s 


so Eq. (3.20) with j = n — | implies that 


An — An— hy_ 
f'(b) = So = 3 : (2¢n—1 + Cn) + Mn—1(Cn-1 + Cn) 
n—1 
Qn — An-1 


7 sp Ee i Bee) 
= Tn 3 Cn-1 Cn), 


and 


3 
Iyn—1Cn—1 + 2-1 Cn = 3 f'(b) _— h 


n—1 


(Qn ~~ Gn—-1)- 
Equations (3.21) together with the equations 
3 / 
2hoco + hoc, = 7, — a) —3f' (@) 
0 


and 


(An — an- 1) 


3 
In—1Cn—1 + 2hn—1Cn = 3 f'(b) _ h 


n—1 
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determine the linear system Ax = b, where 


2ho ho OPER CSEEEEEEOSS 0 
ho = 2(ho +1) hy , fon, 
O. Ine. 2h th) Ine. 
~ . “ a ore i ‘0 
: Tn 2 2(hn 2+hny 1) hn 1 
(isosiied ts saree ee eeateun teen 20) Int Wn 
in (a — a) —3f'@ 
i, (a2 — a1) — F(a — a) Co 
cl 
b= » and x=], 


Io (An = An-1) _ Foy (An om An-2) Cn 
3 f'(b) — Fae Gn — Gn-1) 

This matrix A is also strictly diagonally dominant, so it satisfies the conditions of 

Theorem 6.21 in Section 6.6. Therefore, the linear system has a unique solution for 

Co,C1,--+5Cn- = 8 &@ 


The solution to the cubic spline problem with the boundary conditions S’(x9) = f' (xo) 
and S"(x,) = f’(%n) can be obtained by applying Algorithm 3.5. 


Clamped Cubic Spline 


To construct the cubic spline interpolant S for the function f defined at the numbers x9 < 
xX, <+++ < Xp, Satisfying S’(xo) = f’ (xo) and S’(x») = f’ (Xn): 


INPUT 7; x0,x1,---5%n3 do = fo), a1 = f1),---,4n = fn); FPO = f' (x0); 
FPN = f' (Xn). 


OUTPUT Gj, bj, Cj, dj for j => 0, 1, ee y (aad 1. 
(Note: S(x) = Sj(x) = aj + bj& — xj) + 7% — xj) + d(x — xj)° for Xj SX < X41.) 
Step 1 Fori=0,1,...,n—1 seth; = xj41 — xj. 


Step 2 Set ay = 3(a, — ao)/ho — 3FPO; 
Qn = 3FPN — 3(ay — An—1)/Mn-1- 


Step 3 Fori=1,2,...,.n—1 


3 3 
set aj = pat — aj) — Reg — aj-1). 
Step 4 Setl) =2ho; (Steps 4,5,6, and part of Step 7 solve a tridiagonal linear system 
using a method described in Algorithm 6.7.) 
Lo = 0.5; 
Zo = ag/Io. 
Step 5 Fori=1,2,...,n—1 
set jj = 20%j41 — Xi-1) — Ai-1Mi-15 
bj = hj/1;; 
Zi = (a — Wy-12i-1)/li. 
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Step 6 Set lh = hy-1 (2 _ Hn—-1)3 
Ln = (dy, _ Iy—12n—1) [Ln 
Ca = Zar 
Step 7 Forj=n-—1,n—2,...,0 
set Cj = Z — MjCj+13 
bj = (G41 — a) /hy — hy (cin + 2¢))/3; 
dj = (cj41 — Gj) /GBAj). 
Step 8 OUTPUT (qj, bj, cj, d; for j = 0,1,...,n — 1); 
STOP. 


Example 4 Example 2 used a natural spline and the data points (0, 1), (1, e), (2, e”), and (3, e*) to form 
a new approximating function S(x). Determine the clamped spline s(x) that uses this data 
and the additional information that, since f’(x) = e*, so f’/(0) = 1 and f’(3) =e’. 


Solution As in Example 2, we have n = 3, ho = hy = hy = 1, ap = 0, ai = @, an = ”, 
and a3 = e°. This together with the information that f’(0) = 1 and f’(3) = e® gives the 
the matrix A and the vectors b and x with the forms 


0 3(e — 2) C 


2 1 0 0 
= ie aot @ _ | 3(e? —2e+ 1) |e 
SSG tap = ae te ee ee I 
00 1 2 362 C3 


The vector-matrix equation Ax = b is equivalent to the system of equations 
2co + cy = 3(e — 2), 
co t+ 4c, tO = 3(e? —2e+ 1), 
c1 + 4c2 +03 = 3(e? — 2e* +), 
co + 2¢c3 = 3c. 


Solving this system simultaneously for co, c1, cz and c3 gives, to 5 decimal places, 
co = 508 — 12e* + 42e — 59) = 0.44468, 
c= ate + 24e? — 39e + 28) = 1.26548, 
co = ae (ide! — 39e? + 24e — 8) = 3.35087, 
3 = are + 42e* — 12e + 4) = 9.40815. 


Solving for the remaining constants in the same manner as Example 2 gives 
by = 1.00000, 6b; = 2.71016, by = 7.32652, 
and 


dy = 0.27360, d; = 0.69513, dz = 2.01909. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


3.5 Cubic Spline Interpolation 157 


This gives the clamped cubic spine 


1+x + 0.44468x? + 0.27360x°, ifO0<x <1, 
s(x) = 4 2.71828 + 2.71016(« —1) + 1.26548(x« —1)? + 0.69513(x —1)°, ifl <x <2, 
7.38906 + 7.32652(x —2) + 3.35087(x —2)? + 2.01909(x —2)?, if2<x <3. 
The graph of the clamped spline and f(x) = e* are so similar that no difference can be 


seen. | 


We can create the clamped cubic spline in Example 4 with the same commands we 
used for the natural spline, the only change that is needed is to specify the derivative at the 
endpoints. In this case we use 


sn := t — Spline (10., 1.0], [1.0, f 1.0)], [2.0, f(2.0)], [3.0, £(3.0)]], t, degree = 3, 
endpoints = [1. 0, e* 


giving essentially the same results as in the example. 
We can also approximate the integral of f on [0, 3], by integrating the clamped spline. 
The exact value of the integral is 


3 
/ e dx = e® — 1 © 20.08554 — 1 = 19.08554. 
0 


Because the data is equally spaced, piecewise integrating the clamped spline results in the 
same formula as in (3.22), that is, 


3 
1 
/ 8(x) dx = (dy + ay + a2) + 3 (0 + bi + b2) 
0 


1 1 
+ zoo Fe +c¢2)+ qh ta +d). 


Hence the integral approximation is 


3 1 
i s(x) dx = (1 + 2.71828 + 7.38906) + 5 + 2.71016 + 7.32652) 
0 


+ (0.44468 + 1.26548 + 3.35087) + 1 (0.27360 + 0.69513 + 2.01909) 
= 19.05965. 
The absolute error in the integral approximation using the clamped and natural splines are 
Natural : |19.08554 — 19.55229| = 0.46675 
and 
Clamped : |19.08554 — 19.05965| = 0.02589. 


For integration purposes the clamped spline is vastly superior. This should be no surprise 
since the boundary conditions for the clamped spline are exact, whereas for the natural 
spline we are essentially assuming that, since f”(x) = e’, 


0=S"(0) = f’(0)=e'=1 and 0=S"(3) ~ f"(3) =e? = 20. 


The next illustration uses a spine to approximate a curve that has no given functional 
representation. 
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Illustration Figure 3.11 shows a ruddy duck in flight. To approximate the top profile of the duck, we 
have chosen points along the curve through which we want the approximating curve to pass. 
Table 3.18 lists the coordinates of 21 data points relative to the superimposed coordinate 
system shown in Figure 3.12. Notice that more points are used when the curve is changing 
rapidly than when it is changing more slowly. 


Figure 3.11 


J 


Table 3.18 


x |0.9]1.3|1.9 |2.1]2.6/3.0|3.9|4.4 [4.7 |5.0]6.0 |7.0|8.0 |9.2 |10.5|11.3|11.6| 12.0) 12.6] 13.0|13.3 
f(x) [1.3] 1.5] 1.85]2.1]2.6/2.7|2.4]2.15]2.05]2.1]2.25]2.3|2.25]1.95] 1.4] 0.9] 0.7] 0.6| 0.5] 0.4] 0.25 


Figure 3.12 


Using Algorithm 3.4 to generate the natural cubic spline for this data produces the coeffi- 
cients shown in Table 3.19. This spline curve is nearly identical to the profile, as shown in 
Figure 3.13. 
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Table 3.19 = P iy ; d, 
0 0.9 1.3 5.40 0.00 —0.25 
1 1.3 1.5 0.42 —0.30 0.95 
2 1.9 1.85 1.09 1.41 —2.96 
3 2.1 2.1 1.29 —0.37 —0.45 
4 2.6 2.6 0.59 —1.04 0.45 
5 3.0 2.7 —0.02 —0.50 0.17 
6 3.9 2.4 —0.50 —0.03 0.08 
7 4.4 2.15 —0.48 0.08 1.31 
8 47 2.05 —0.07 1.27 —1.58 
9 5.0 2.1 0.26 —0.16 0.04 

10 6.0 2.25 0.08 —0.03 0.00 
11 7.0 2:3 0.01 —0.04 —0.02 
12 8.0 225 —0.14 —0.11 0.02 
13 9.2 1.95 —0.34 —0.05 —0.01 
14 10.5 1.4 —0.53 —0.10 —0.02 
15 11.3 0.9 —0.73 —0.15 1.21 
16 11.6 0.7 —0.49 0.94 —0.84 
17 12.0 0.6 —0.14 —0.06 0.04 
18 12.6 0.5 —0.18 0.00 —0.45 
19 13.0 0.4 —0.39 —0.54 0.60 
20 13.3 0.25 
Figure 3.13 


For comparison purposes, Figure 3.14 gives an illustration of the curve that is generated using 
a Lagrange interpolating polynomial to fit the data given in Table 3.18. The interpolating 
polynomial in this case is of degree 20 and oscillates wildly. It produces a very strange 
illustration of the back of a duck, in flight or otherwise. 
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Figure 3.14 


Theorem 3.13 


Interpolation and Polynomial Approximation 


To use a clamped spline to approximate this curve we would need derivative approxima- 
tions for the endpoints. Even if these approximations were available, we could expect little 
improvement because of the close agreement of the natural cubic spline to the curve of the 


top profile. 


Constructing a cubic spline to approximate the lower profile of the ruddy duck would 
be more difficult since the curve for this portion cannot be expressed as a function of x, and 
at certain points the curve does not appear to be smooth. These problems can be resolved 
by using separate splines to represent various portions of the curve, but a more effective 
approach to approximating curves of this type is considered in the next section. 

The clamped boundary conditions are generally preferred when approximating func- 
tions by cubic splines, so the derivative of the function must be known or approximated 
at the endpoints of the interval. When the nodes are equally spaced near both end- 
points, approximations can be obtained by any of the appropriate formulas given in 
Sections 4.1 and 4.2. When the nodes are unequally spaced, the problem is considerably 
more difficult. 

To conclude this section, we list an error-bound formula for the cubic spline with 
clamped boundary conditions. The proof of this result can be found in [Schul], pp. 57-58. 


Let f € C*fa,b] with maxg<,<, | f(x)| = M. If S is the unique clamped cubic spline 


interpolant to f with respect to the nodes a = xp < x1 <-+-- < x, = D, then for all x in 
[a,b], 
LG) — S01 < max G1 5)! i 
x) — S(x — max (x41 —4;)"- 
SA Oca 


A fourth-order error-bound result also holds in the case of natural boundary conditions, 
but it is more difficult to express. (See [BD], pp. 827-835.) 

The natural boundary conditions will generally give less accurate results than the 
clamped conditions near the ends of the interval [xo,x,] unless the function f happens 
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to nearly satisfy f” (xo) = f” (xn) = 0. An alternative to the natural boundary condition 
that does not require knowledge of the derivative of f is the not-a-knot condition, (see 
[Deb2], pp. 55-56). This condition requires that $’” (x) be continuous at x; and at x,_1. 


EXERCISE SET 3.5 


1. Determine the natural cubic spline S that interpolates the data f(0) = 0, f(1) = 1, and f(2) = 2. 
Determine the clamped cubic spline s that interpolates the data f(0) = 0, f(1) = 1, f(2) = 2 and 


satisfies s’(0) = s’(2) = 1. 


3. Construct the natural cubic spline for the following data. 


a. 


x fx) bo x fF) 

8.3 | 17.56492 0.8 | 0.22363362 

8.6 | 18.50515 1.0 | 0.65809197 
x | f(x) dx f@) 

—0.5 | —0.0247500 0.1 | —0.62049958 

—0.25 | 0.3349375 0.2 | —0.28398668 
0 1.1010000 0.3 0.00660095 

0.4 0.24842440 


4. Construct the natural cubic spline for the following data. 


a. 


x FQ) 

0 1.00000 
0.5 | 2.71828 

x ff) 
0.1 | —0.29004996 
0.2 | —0.56079734 
0.3 | —0.81401972 


b. x Ff) 
—0.25 | 1.33203 
0.25 | 0.800781 
d. x | f@ 


-1 0.86199480 
—0.5 | 0.95802009 
0 1.0986123 
0.5 | 1.2943767 


5. The data in Exercise 3 were generated using the following functions. Use the cubic splines constructed 
in Exercise 3 for the given value of x to approximate f(x) and f’(x), and calculate the actual error. 


=xlInx; approximate f(8.4) and f’(8.4). 


a. 
b. 
Cc. 
d. 


a. 
b. 
c. 
d. 


f(x) 
f(x) 


= sin(e* — 2); 


approximate f (0.9) and f’(0.9). 


f (x) = x3 + 4.001x? + 4.002x + 1.101; approximate f(—+) and f’(—+). 

f(x) =xcosx — 2x? + 3x — 1; approximate f (0.25) and f’(0.25). 

6. The data in Exercise 4 were generated using the following functions. Use the cubic splines constructed 
in Exercise 4 for the given value of x to approximate f(x) and f’(x), and calculate the actual error. 

=e**; approximate f (0.43) and f’(0.43). 

f@m= xe +x —x +1; approximate f(0) and f’(0). 

f(x) =x? cosx — 3x; approximate f (0.18) and f’(0.18). 


fF) 


f(x) 


= In(e* + 2); 


approximate f (0.25) and f’(0.25). 


7. Construct the clamped cubic spline using the data of Exercise 3 and the fact that 
Ff‘ (8.3) = 3.116256 and f’(8.6) = 3.151762 

f'(0.8) = 2.1691753 and f’(1.0) = 2.0466965 

f'(—0.5) = 0.7510000 and f’(0) = 4.0020000 

f'(0.1) = 3.58502082 and f’(0.4) = 2.16529366 

8. Construct the clamped cubic spline using the data of Exercise 4 and the fact that 
f'(O) = 2 and f'(0.5) = 5.43656 

f'(—0.25) = 0.437500 and f’(0.25) = —0.625000 


a. 
b. 
Cc. 
d. 


a. 
b. 
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ce f’(0.1) = —2.8004996 and f’(0) = —2.9734038 
dd. f’(—1) = 0.15536240 and f’(0.5) = 0.45186276 
9. Repeat Exercise 5 using the clamped cubic splines constructed in Exercise 7. 
10. Repeat Exercise 6 using the clamped cubic splines constructed in Exercise 8. 
11. A natural cubic spline S on [0, 2] is defined by 


S(x) So(x) = 1+ 2x — x3, if O<x <1, 
= 
Six) =24+b@-1)+c@—-1°?+da—-1)%, if 1<x<2. 


Find b, c, and d. 
12. Aclamped cubic spline s for a function f is defined on [1,3] by 


oo so(x) = 3(x — 1) +20 — 1)? —@— 13, if l1<x <2, 
re s(%) =a+b(x—2)+cx—2)?+dx—2), if 2<x <3. 


Given f'(1) = f’(), find a, b, c, and d. 
13. A natural cubic spline S is defined by 


Sox) = 1+ Bx —1) — D(x — 1)3, if 1<x <2, 
S(x) = 3 . es 
SiQ) =14+b@—2)-3@-2P +d@—2)3, if 2<x<3. 


If S interpolates the data (1, 1), (2, 1), and (3, 0), find B, D, b, and d. 
14.  Aclamped cubic spline s for a function f is defined by 


so(x) = 1+ Bx + 2x? — 2x3, if O<x <1, 
oe s(x) =1+b@-1-4e-1)7°4+70@-)3, if l<x<2. 
Find f’(0) and f’(2). 
15. Construct a natural cubic spline to approximate f(x) = cos zx by using the values given by f(x) at 
x = 0,0.25, 0.5, 0.75, and 1.0. Integrate the spline over [0, 1], and compare the result to ii, cos mx dx = 
0. Use the derivatives of the spline to approximate f’(0.5) and f” (0.5). Compare these approximations 
to the actual values. 
16. Construct a natural cubic spline to approximate f (x) = e~* by using the values given by f(x) atx = 0, 
0.25, 0.75, and 1.0. Integrate the spline over [0, 1], and compare the result to i e* dx =1-I/e. 
Use the derivatives of the spline to approximate f’(0.5) and f” (0.5). Compare the approximations to 
the actual values. 


17. Repeat Exercise 15, constructing instead the clamped cubic spline with f’(0) = f’(1) = 0. 

18. Repeat Exercise 16, constructing instead the clamped cubic spline with f’(0) = —1, f’(1) = —e"!. 

19. Suppose that f(x) is a polynomial of degree 3. Show that f(x) is its own clamped cubic spline, but 
that it cannot be its own natural cubic spline. 

20. Suppose the data {x;, f(x;))}#_, lie on a straight line. What can be said about the natural and clamped 
cubic splines for the function f? [Hint: Take a cue from the results of Exercises 1 and 2.] 

21. Given the partition x) = 0, x, = 0.05, and x. = 0.1 of [0, 0.1], find the piecewise linear interpolating 
function F for f (x) = e**. Approximate i e* dx with i F(x) dx, and compare the results to the 
actual value. 


22. Let f € C?[a,b], and let the nodes a = xy < x, <--- < x, = b be given. Derive an error estimate 
similar to that in Theorem 3.13 for the piecewise linear interpolating function F’. Use this estimate to 
derive error bounds for Exercise 21. 


23. Extend Algorithms 3.4 and 3.5 to include as output the first and second derivatives of the spline at the 
nodes. 


24. Extend Algorithms 3.4 and 3.5 to include as output the integral of the spline over the interval [Xo, x,]. 
25. Given the partition x9 = 0, x, = 0.05, x. = 0.1 of [0,0.1] and f(x) = e**: 


a. Find the cubic spline s with clamped boundary conditions that interpolates f. 


b. Find an approximation for oe e** dx by evaluating a s(x) dx. 
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c. Use Theorem 3.13 to estimate maxo<y<o1 | f(x) — s(x)| and 


0.1 


0.1 
F(x) dx -— / s(x) dx 
0 


0 
d. Determine the cubic spline S with natural boundary conditions, and compare $(0.02), s(0.02), 
and e°-4 — 1.04081077. 


26. Let f be defined on [a, b], and let the nodes a = xy < x; < x» = b be given. A quadratic spline 
interpolating function S consists of the quadratic polynomial 


So(x) = do + bo(x — Xo) + cole — x0)? on [xo, *1] 
and the quadratic polynomial 
Si@) =a, +bi@—m)+e@—x1)? on [2], 


such that 


i. S(Xo) = fo), SQ@1) = fOr), and SQ) = fi), 
ii. Se C![x, x]. 
Show that conditions (i) and (ii) lead to five equations in the six unknowns do, bo, Co, a1, b;, and cy. 
The problem is to decide what additional condition to impose to make the solution unique. Does the 
condition § € C?[xo, x2] lead to a meaningful solution? 
27. Determine a quadratic spline s that interpolates the data f(0) = 0, f(1) = 1, f(2) = 2 and satisfies 
(0) = 2. 
28. a. The introduction to this chapter included a table listing the population of the United States from 
1950 to 2000. Use natural cubic spline interpolation to approximate the population in the years 
1940, 1975, and 2020. 
b. The population in 1940 was approximately 132,165,000. How accurate do you think your 1975 
and 2020 figures are? 
29. Acar traveling along a straight road is clocked at a number of points. The data from the observations 
are given in the following table, where the time is in seconds, the distance is in feet, and the speed is 
in feet per second. 


Time 0 3 5 8 13 


Distance 0 | 225 | 383 | 623 | 993 


Speed 75 77 80 74 72 


a. Use aclamped cubic spline to predict the position of the car and its speed when t = 10s. 
b. Use the derivative of the spline to determine whether the car ever exceeds a 55-mi/h speed limit 
on the road; if so, what is the first time the car exceeds this speed? 
c. What is the predicted maximum speed for the car? 
30. The 2009 Kentucky Derby was won by a horse named Mine That Bird (at more than 50:1 odds) 


in a time of 2:02.66 (2 minutes and 2.66 seconds) for the 14-mile race. Times at the quarter-mile, 

half-mile, and mile poles were 0:22.98, 0:47.23, and 1:37.49. 

a. Use these values together with the starting time to construct a natural cubic spline for Mine That 
Bird’s race. 

b. Use the spline to predict the time at the three-quarter-mile pole, and compare this to the actual 
time of 1:12.09. 

c. Use the spline to approximate Mine That Bird’s starting speed and speed at the finish line. 

31. It is suspected that the high amounts of tannin in mature oak leaves inhibit the growth of the winter 
moth (Operophtera bromata L., Geometridae) larvae that extensively damage these trees in certain 
years. The following table lists the average weight of two samples of larvae at times in the first 28 days 
after birth. The first sample was reared on young oak leaves, whereas the second sample was reared 
on mature leaves from the same tree. 


a. Use a natural cubic spline to approximate the average weight curve for each sample. 
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b. Find an approximate maximum average weight for each sample by determining the maximum 
of the spline. 


Day 0 6 10 13 17 20 28 
Sample | average weight (mg) | 6.67 | 17.33 | 42.67 | 37.33 | 30.10 | 29.31 | 28.74 
Sample 2 average weight (mg) | 6.67 | 16.11 | 18.89 | 15.00 | 10.56 | 9.44} 8.89 


32. The upper portion of this noble beast is to be approximated using clamped cubic spline interpolants. 
The curve is drawn on a grid from which the table is constructed. Use Algorithm 3.5 to construct the 


three clamped 


cubic splines. 


« 


SO) 4 Slope — 5 Slope 3 Slope —4 
8 \ 
4 \ 
6 Slope | i \ : 
5 Slope z 
4 
3 [x 
i ~~ 
2 Curve | Curve 2 Curve 3 Slope —2 
1 
a 
5 10 15 20 25 30 x 
Curve 1 Curve 2 Curve 3 
U Xj f (i) f' (xi) i Xj f (i) f' (i) i Xj Si) f' (i) 
0 1 3.0 1.0 0 17 4.5 3.0 QO 27.7 4.1 0.33 
1 2 3.7 1 20 7.0 1 28 4.3 
2 5 3.9 2 23 6.1 2 29 4.1 
3 6 4.2 3 24 5.6 3 30 3.0 —1.5 
4 7 5.7 4 25 5.8 
5 8 6.6 5 27 522: 
6 10 7A 6 = 27.7 4.1 —4.0 
7 13 6.7 
8 17 4.5 —0.67 


33. Repeat Exercise 32, constructing three natural splines using Algorithm 3.4. 


| a 3.6 Parametric Curves 


None of the techniques developed in this chapter can be used to generate curves of the form 
shown in Figure 3.15 because this curve cannot be expressed as a function of one coordinate 
variable in terms of the other. In this section we will see how to represent general curves 
by using a parameter to express both the x- and y-coordinate variables. Any good book 
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on computer graphics will show how this technique can be extended to represent general 
curves and surfaces in space. (See, for example, [FVFH].) 


Figure 3.15 


A straightforward parametric technique for determining a polynomial or piecewise 
polynomial to connect the points (xo, yo), (41, 91), -- +» (%n, Yn) in the order given is to use 
a parameter f on an interval [fo, t,], with t9 < t; <--- < t,, and construct approximation 
functions with 


xj=x(t;) and y;=y(t;), foreachi=0,1,...,n. 
The following example demonstrates the technique in the case where both approximat- 
ing functions are Lagrange interpolating polynomials. 
Example 1 Construct a pair of Lagrange polynomials to approximate the curve shown in Figure 3.15, 
using the data points shown on the curve. 


Solution There is flexibility in choosing the parameter, and we will choose the points 
{tito equally spaced in [0,1], which gives the data in Table 3.20. 


Table 3.20 j 0 1 2 3 4 
t 0 0.25 0.5 0.75 1 
x; -1 0 1 0 1 
yi 0 1 0.5 0 = 


This produces the interpolating polynomials 
x(t) = (((64¢ — 332) ¢+60)¢- 8)t—1 and y(t) = (((—Sr+ 48) — 48) t+ 11)t. 


Plotting this parametric system produces the graph shown in blue in Figure 3.16. Although 
it passes through the required points and has the same basic shape, it is quite a crude ap- 
proximation to the original curve. A more accurate approximation would require additional 
nodes, with the accompanying increase in computation. a 
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Figure 3.16 


A successful computer design 
system needs to be based on a 
formal mathematical theory so 
that the results are predictable, 
but this theory should be 
performed in the background so 
that the artist can base the design 
on aesthetics. 


Interpolation and Polynomial Approximation 


(x(), WO) 


Parametric Hermite and spline curves can be generated in a similar manner, but these 
also require extensive computational effort. 

Applications in computer graphics require the rapid generation of smooth curves that 
can be easily and quickly modified. For both aesthetic and computational reasons, changing 
one portion of these curves should have little or no effect on other portions of the curves. 
This eliminates the use of interpolating polynomials and splines since changing one portion 
of these curves affects the whole curve. 

The choice of curve for use in computer graphics is generally a form of the piece- 
wise cubic Hermite polynomial. Each portion of a cubic Hermite polynomial is completely 
determined by specifying its endpoints and the derivatives at these endpoints. As a conse- 
quence, one portion of the curve can be changed while leaving most of the curve the same. 
Only the adjacent portions need to be modified to ensure smoothness at the endpoints. The 
computations can be performed quickly, and the curve can be modified a section at a time. 

The problem with Hermite interpolation is the need to specify the derivatives at 
the endpoints of each section of the curve. Suppose the curve has n + 1 data points 
(x(to), Y(to)), ---5 (X(t), V(tn)), and we wish to parameterize the cubic to allow complex 
features. Then we must specify x’/(t;) and y'(t;), for each i = 0,1,...,n. This is not as 
difficult as it would first appear, since each portion is generated independently. We must 
ensure only that the derivatives at the endpoints of each portion match those in the adjacent 
portion. Essentially, then, we can simplify the process to one of determining a pair of cubic 
Hermite polynomials in the parameter t, where fo = 0 and t; = 1, given the endpoint data 
(x(0), y(O)) and (x(1), y(1)) and the derivatives dy/dx (at t = 0) and dy/dx (at t = 1). 

Notice, however, that we are specifying only six conditions, and the cubic polynomials 
in x(t) and y(t) each have four parameters, for a total of eight. This provides flexibility 
in choosing the pair of cubic Hermite polynomials to satisfy the conditions, because the 
natural form for determining x(t) and y(t) requires that we specify x’(0), x’(1), y’(O), and 
y’ (1). The explicit Hermite curve in x and y requires specifying only the quotients 


_ yO) _ yd) 
~ x/(0) ~ x1)" 


By multiplying x’(0) and y’(0) by a common scaling factor, the tangent line to the curve 
at (x(0), y(O)) remains the same, but the shape of the curve varies. The larger the scaling 


er=0) and ous 1) 
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factor, the closer the curve comes to approximating the tangent line near (x(0), y(0)). A 
similar situation exists at the other endpoint (x(1), y(1)). 

To further simplify the process in interactive computer graphics, the derivative at an 
endpoint is specified by using a second point, called a guidepoint, on the desired tangent 
line. The farther the guidepoint is from the node, the more closely the curve approximates 
the tangent line near the node. 

In Figure 3.17, the nodes occur at (xo, yo) and (x1, 1), the guidepoint for (xo, yo) is 
(xo + ao, Yo + Bo), and the guidepoint for (x1, y1) is (41 — 1, y1 — 1). The cubic Hermite 
polynomial x(t) on [0, 1] satisfies 


x(0) =x, x(1)=x, x(0)=a0, and x (1) =a. 


Figure 3.17 


(Xo + Oo, Yo + Bo) 


(Xo. Yo) 
(x1,¥1) 


The unique cubic polynomial satisfying these conditions is 
x(t) = [2(@ — m1) + @o + IP + [3Q1 — x0) — (1 + 2ao)]? +.a0t +x. (3.23) 
In a similar manner, the unique cubic polynomial satisfying 
yO)=yo, yA=y, y(O)=fo, and yl) = fi 
is 
y(t) = [200 — yi) + (Bo + BiIP + [301 — yo) — (Bi + 2Bo) It? + Bot+ yo. (3.24) 


Example 2 Determine the graph of the parametric curve generated Eq. (3.23) and (3.24) when the end 
points are (xo, yo) = (0,0) and (x1, y1) = (1,0), and respective guide points, as shown in 
Figure 3.18 are (1, 1) and (0, 1). 


Va 
Solution The endpoint information implies that x» = 0, x; = 1, yo = 0, and y; = 0, and 
_Guidepoints the guide points at (1, 1) and (0, 1) imply that a = 1, a; = 1, Bp = 1, and 6; = —1. Note 
on) Ts. ye “ that the slopes of the guide lines at (0,0) and (1, 0) are, respectively 
a 1 =i 
“ny ee and Le oe 
os... % a 1 ay 1 
7” Nodes “x 
a ~s > Equations (3.23) and (3.24) imply that for ¢ € [0, 1] we have 
> qd, x 
Figure 3.18 x(t) = (20-1) ++ DP + BO-0)-4+2-D]P+1-1+0=8 
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and 
y(t) = [200 — 0) + (1+ (-D)]8 + BO -0) — (-14+2- DP +1-14+0=-P +2. 


This graph is shown as (a) in Figure 3.19, together with some other possibilities of curves 
produced by Eggs. (3.23) and (3.24) when the nodes are (0,0) and (1,0) and the slopes at 
these nodes are | and —1, respectively. a 


Figure 3.19 
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Pierre Etienne Bézier 
(1910-1999) was head of design 
and production for Renault 
motorcars for most of his 
professional life. He began his 
research into computer-aided 
design and manufacturing in 
1960, developing interactive tools 
for curve and surface design, and 
initiated computer-generated 
milling for automobile modeling. 


The Bézier curves that bear his 
name have the advantage of being 
based on a rigorous mathematical 
theory that does not need to be 
explicitly recognized by the 
practitioner who simply wants to 
make an aesthetically pleasing 
curve or surface. These are the 
curves that are the basis of the 
powerful Adobe Postscript 
system, and produce the freehand 
curves that are generated in most 
sufficiently powerful computer 


graphics packages. 
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The standard procedure for determining curves in an interactive graphics mode is to first 
use a mouse or touchpad to set the nodes and guidepoints to generate a first approximation 
to the curve. These can be set manually, but most graphics systems permit you to use your 
input device to draw the curve on the screen freehand and will select appropriate nodes and 
guidepoints for your freehand curve. 

The nodes and guidepoints can then be manipulated into a position that produces an 
aesthetically pleasing curve. Since the computation is minimal, the curve can be determined 
so quickly that the resulting change is seen immediately. Moreover, all the data needed to 
compute the curves are imbedded in the coordinates of the nodes and guidepoints, so no 
analytical knowledge is required of the user. 

Popular graphics programs use this type of system for their freehand graphic representa- 
tions in a slightly modified form. The Hermite cubics are described as Bézier polynomials, 
which incorporate a scaling factor of 3 when computing the derivatives at the endpoints. 
This modifies the parametric equations to 


x(t) = [20 — x1) + 3(ao + a1) ]¢3 + [301 — x0) — 3(@1 + 2a0)]t? + 3a0f + x0, (3.25) 
and 
y(t) = [200 — y1) + 3(Bo + BiIP + [301 — yo) — 3(B1 + 280) It? + 3Bot + yo, (3.26) 


for 0 < t < 1, but this change is transparent to the user of the system. 
Algorithm 3.6 constructs a set of Bézier curves based on the parametric equations in 
Eqs. (3.25) and (3.26). 


Bézier Curve 


To construct the cubic Bézier curves Co,..., Cy— 
sented by 


| in parametric form, where C; is repre- 


(x), 9:0) = GP + at + oP? + a? PbO + 24+ DOP + BOF), 


for 0 < t < 1, as determined by the left endpoint (x;, y;), left guidepoint (x;*, y;"), right 
endpoint (x;41, yi+1), and right guidepoint (x;,;,¥;,,) for eachi=0,1,...,2—1: 


INPUT n; (x, yo), --- 
OUTPUT 


> (XnsYn)3 Oe us S29: Ca (x) .¥))> ome 2 (XY, ) 
coefficients {a?, a, a, a®, B®, b®, b®, b®, for 0 <i <n— 1}. 
Step 1 For eachi=0,1,...,n— 1 do Steps 2 and 3. 


Step 2 Set ay = = Xj; 


om =i 

a? = 3 — x); 

» 307-3) 

ay = 3(x; + x41 — 2x7); 

» = 30% + Vig — 297) 

a = Xin — Xi + 3x7 — 3X43 
p? = ier — it Sy — BV 


Step 3. OUTPUT (a), a\?, a), a? bb, bY, b®). 
Step 4 STOP. a 
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Three-dimensional curves are generated in a similar manner by additionally specifying 
third components Zo and z; for the nodes and zy + yo and z; — y; for the guidepoints. The more 
difficult problem involving the representation of three-dimensional curves concerns the loss 
of the third dimension when the curve is projected onto a two-dimensional computer screen. 
Various projection techniques are used, but this topic lies within the realm of computer 
graphics. For an introduction to this topic and ways that the technique can be modified for 
surface representations, see one of the many books on computer graphics methods, such as 
[FVFH]. 


EXERCISE SET 3.6 


1. 


Let (%,¥0) = (0,0) and (x;,y,) = (5,2) be the endpoints of a curve. Use the given guide- 

points to construct parametric cubic Hermite approximations (x(t), y(t)) to the curve, and graph the 

approximations. 

a. (1, 1) and (6, 1) ce. (1,1) and (6,3) 

b. (0.5, 0.5) and (5.5, 1.5) d. (2,2) and (7,0) 

Repeat Exercise 1 using cubic Bézier polynomials. 

Construct and graph the cubic Bézier polynomials given the following points and guidepoints. 

a. Point (1, 1) with guidepoint (1.5, 1.25) to point (6, 2) with guidepoint (7, 3) 

b. Point (1, 1) with guidepoint (1.25, 1.5) to point (6,2) with guidepoint (5, 3) 

c. Point (0,0) with guidepoint (0.5, 0.5) to point (4, 6) with entering guidepoint (3.5, 7) and exiting 
guidepoint (4.5, 5) to point (6, 1) with guidepoint (7, 2) 

d. Point (0,0) with guidepoint (0.5, 0.5) to point (2, 1) with entering guidepoint (3, 1) and exiting 
guidepoint (3, 1) to point (4,0) with entering guidepoint (5, 1) and exiting guidepoint (3, —1) 
to point (6, —1) with guidepoint (6.5, —0.25) 


Use the data in the following table and Algorithm 3.6 to approximate the shape of the letter NV. 


0}; 3 6 | 3.3 | 65 

1 2 2 | 2.8 | 3.0 | 2.5 | 2.5 
2) 6 6 | 5.8 | 5.0 | 5.0 | 5.8 
3 5 2 | 5.5 | 2.2 | 4.5 | 2.5 
4/65] 3 6.4 | 2.8 


Suppose a cubic Bézier polynomial is placed through (uo, vo) and (u3, v3) with guidepoints (uw), v;) 
and (uz, v2), respectively. 


a. Derive the parametric equations for u(t) and v(t) assuming that 
u(0) =u, ud)=uw, wWO)=um—uMw, WO)=w-—w 
and 
v(0) =v, vd)=v3, Vv (O)=v— Vv, vV’(1) = v3 — vy. 
b. Let f(i/3) = u;, fori = 0,1,2,3 and g(i/3) = v;, for i = 0,1, 2,3. Show that the Bernstein 


polynomial of degree 3 in ¢ for f is u(t) and the Bernstein polynomial of degree three in ¢ for g 
is v(t). (See Exercise 23 of Section 3.1.) 
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| 3.7 Survey of Methods and Software 


In this chapter we have considered approximating a function using polynomials and piece- 
wise polynomials. The function can be specified by a given defining equation or by pro- 
viding points in the plane through which the graph of the function passes. A set of nodes 
X0;X1,--+-,Xp, 18 given in each case, and more information, such as the value of various 
derivatives, may also be required. We need to find an approximating function that satisfies 
the conditions specified by these data. 

The interpolating polynomial P(x) is the polynomial of least degree that satisfies, for 
a function f, 


P(x;) = f@), foreachi=0,1,...,n. 


Although this interpolating polynomial is unique, it can take many different forms. The 
Lagrange form is most often used for interpolating tables when n is small and for deriving 
formulas for approximating derivatives and integrals. Neville’s method is used for eval- 
uating several interpolating polynomials at the same value of x. Newton’s forms of the 
polynomial are more appropriate for computation and are also used extensively for deriv- 
ing formulas for solving differential equations. However, polynomial interpolation has the 
inherent weaknesses of oscillation, particularly if the number of nodes is large. In this case 
there are other methods that can be better applied. 

The Hermite polynomials interpolate a function and its derivative at the nodes. They 
can be very accurate but require more information about the function being approximated. 
When there are a large number of nodes, the Hermite polynomials also exhibit oscillation 
weaknesses. 

The most commonly used form of interpolation is piecewise-polynomial interpolation. 
If function and derivative values are available, piecewise cubic Hermite interpolation is 
recommended. This is the preferred method for interpolating values of a function that is 
the solution to a differential equation. When only the function values are available, natural 
cubic spline interpolation can be used. This spline forces the second derivative of the spline 
to be zero at the endpoints. Other cubic splines require additional data. For example, the 
clamped cubic spline needs values of the derivative of the function at the endpoints of the 
interval. 

Other methods of interpolation are commonly used. Trigonometric interpolation, in 
particular the Fast Fourier Transform discussed in Chapter 8, is used with large amounts 
of data when the function is assumed to have a periodic nature. Interpolation by rational 
functions is also used. 

If the data are suspected to be inaccurate, smoothing techniques can be applied, and 
some form of least squares fit of data is recommended. Polynomials, trigonometric functions, 
rational functions, and splines can be used in least squares fitting of data. We consider these 
topics in Chapter 8. 

Interpolation routines included in the IMSL Library are based on the book A Practical 
Guide to Splines by Carl de Boor [Deb] and use interpolation by cubic splines. There 
are cubic splines to minimize oscillations and to preserve concavity. Methods for two- 
dimensional interpolation by bicubic splines are also included. 

The NAG library contains subroutines for polynomial and Hermite interpolation, for 
cubic spline interpolation, and for piecewise cubic Hermite interpolation. NAG also contains 
subroutines for interpolating functions of two variables. 

The netlib library contains the subroutines to compute the cubic spline with various 
endpoint conditions. One package produces the Newton’s divided difference coefficients for 
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a discrete set of data points, and there are various routines for evaluating Hermite piecewise 
polynomials. 

MATLAB can be used to interpolate a discrete set of data points, using either nearest 
neighbor interpolation, linear interpolation, cubic spline interpolation, or cubic interpola- 
tion. Cubic splines can also be produced. 

General references to the methods in this chapter are the books by Powell [Pow] and 
by Davis [Da]. The seminal paper on splines is due to Schoenberg [Scho]. Important books 
on splines are by Schultz [Schul], De Boor [Deb2], Dierckx [Di], and Schumaker [Schum]. 
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Numerical Differentiation and Integration 


Introduction 


A sheet of corrugated roofing is constructed by pressing a flat sheet of aluminum into one 
whose cross section has the form of a sine wave. 


A corrugated sheet 4 ft long is needed, the height of each wave is | in. from the center 
line, and each wave has a period of approximately 27 in. The problem of finding the length 
of the initial flat sheet is one of determining the length of the curve given by f(x) = sinx 
from x = 0 in. to x = 48 in. From calculus we know that this length is 


48 48 
L =f Ji+(f@yax= | 1+ Cosa? ax, 
0 0 


so the problem reduces to evaluating this integral. Although the sine function is one of 
the most common mathematical functions, the calculation of its length involves an elliptic 
integral of the second kind, which cannot be evaluated explicitly. Methods are developed in 
this chapter to approximate the solution to problems of this type. This particular problem 
is considered in Exercise 25 of Section 4.4 and Exercise 12 of Section 4.5. 

We mentioned in the introduction to Chapter 3 that one reason for using alge- 
braic polynomials to approximate an arbitrary set of data is that, given any continuous 
function defined on a closed interval, there exists a polynomial that is arbitrarily close to 
the function at every point in the interval. Also, the derivatives and integrals of polyno- 
mials are easily obtained and evaluated. It should not be surprising, then, that many 
procedures for approximating derivatives and integrals use the polynomials that 
approximate the function. 
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| Sa 4.1 Numerical Differentiation 


Difference equations were used 
and popularized by Isaac Newton 
in the last quarter of the 17th 
century, but many of these 
techniques had previously been 
developed by Thomas Harriot 
(1561-1621) and Henry Briggs 
(1561-1630). Harriot made 
significant advances in navigation 
techniques, and Briggs was the 
person most responsible for the 
acceptance of logarithms as an 
aid to computation. 


Example 1 


The derivative of the function f at xo is 


_ £G@o +h) — fo) 
’ = 1 : 
ro h 
This formula gives an obvious way to generate an approximation to f’ (xo); simply compute 


f (xo +h) — f Go) 
h 


for small values of h. Although this may be obvious, it is not very successful, due to our 
old nemesis round-off error. But it is certainly a place to start. 

To approximate f’(xo), suppose first that x) € (a,b), where f € C7[a,b], and that 
xX; = x9 +h for some h € 0 that is sufficiently small to ensure that x; € [a,b]. We construct 
the first Lagrange polynomial Po, (x) for f determined by xo and x,, with its error term: 


£0) = Pola) + SEY pro 
- Ls —h) 4 f (x0 + : @~—-%) | a= We x0 —/h) £E(X))s 
for some & (x) between xo and x,. Differentiating gives 
fi) = fo + » — fo) 1D, E = we x9 —h) rec | 
_ fot » — fo) is 2% 2 = hence) 
+ Fa OO D.CL"EO). 


Deleting the terms involving &(x) gives 


FG +h) — FG) 


f'@) © ; 


One difficulty with this formula is that we have no information about D, f” (E(x)), so the 
truncation error cannot be estimated. When x is x9, however, the coefficient of D, f” (&(x)) 
is 0, and the formula simplifies to 


f(%o +h) — f (xo) h ” 
A =a) (E): 


For small values of h, the difference quotient [ f(xo + h) — f(xo)]/h can be used to 
approximate f’(xo) with an error bounded by M|h|/2, where M is a bound on | f” (x)| for x 
between xo and x9 +A. This formula is known as the forward-difference formula if i > 0 
(see Figure 4.1) and the backward-difference formula if h < 0. 


f' Go) = (4.1) 


Use the forward-difference formula to approximate the derivative of f(x) = Inx atxp = 1.8 
using h = 0.1, = 0.05, and h = 0.01, and determine bounds for the approximation errors. 


Solution The forward-difference formula 


fURFH = F018) 
h 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


4.1 Numerical Differentiation 175 


Figure 4.1 
Slope f’(Xo) 
(xo + h) — f(Xo) 
Slope ——————— 
h 
with h = 0.1 gives 
Inl.9—In1.8  0.64185389 — 0.58778667 
= = 0.5406722. 
0.1 0.1 
Because f”(x) = —1/x? and 1.8 < € < 1.9, a bound for this approximation error is 
hf” h 0.1 
nf) — In| < = 0.0154321. 
2 2&2 -2(1.8)? 
The approximation and error bounds when h = 0.05 and h = 0.01 are found in a similar 
manner and the results are shown in Table 4.1. 
Table 4.1 , Snes fU8 +h) — (18) In| 
‘ h 2(1.8)? 
0.1 0.64185389 0.5406722 0.0154321 
0.05 0.61518564 0.5479795 0.0077160 
0.01 0.59332685 0.5540180 0.0015432 


Since f’(x) = 1/x, the exact value of f’(1.8) is 0.555, and in this case the error bounds are 
quite close to the true approximation error. a 


To obtain general derivative approximation formulas, suppose that {xo,.x1,...,%n} are 
(n + 1) distinct numbers in some interval J and that f ¢ C’+'(/). From Theorem 3.3 on 
page 112, 


(x — x0) +++ @— xp) 
(n+ 1)! 


FO) =o fOwLe@) + f° E@), 


k=0 
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for some &(x) in J, where Lx (x) denotes the kth Lagrange coefficient polynomial for f at 
X0,X1,..-.,Xy. Differentiating this expression gives 


n 


f'@) = D> f@L,@) + Dy E 


k=0 


— Xo) +++ (% — Xp) 
(n+ 1!) 


FV Ew) 


(x — Xo) +++ & 
(n+ 1)! 
We again have a problem estimating the truncation error unless x is one of the numbers 
x;. In this case, the term multiplying D,[ ft! (€(x))] is 0, and the formula becomes 


—*) pL fO(E))). 


e (n+1) : I 
£'%) = D5 FOE) + i oo? | [@ ->). (4.2) 
k=0 (n ap 1)! k=0 
k#j 


which is called an (n + 1)-point formula to approximate f’(x;). 

In general, using more evaluation points in Eq. (4.2) produces greater accuracy, al- 
though the number of functional evaluations and growth of round-off error discourages this 
somewhat. The most common formulas are those involving three and five evaluation points. 

We first derive some useful three-point formulas and consider aspects of their errors. 


Because 
_ = Of 3 = 
Lo(x) = Gave) , wehave L(x) = a. 
(x9 — X1)(% — X2) (Xo — X1) (Xo — X2) 
Similarly, 
2x — x9 — 2 = ty = 
(Qj. | an Ge 


(x1 — Xo) (X41 — X2) (%2 — X) (2 — x1) 


Hence, from Eq. (4.2), 
f'G5) = f 00) ee + f(a) a 


(Xo — X1) (Xo — X2) (x1 — Xo) (X11 — X2) 


2x; — X90 — xX] 1 QB). _ 
+ Foy)[ a oy 6) [1c Xt)> (4.3) 
kKAj 


for each j = 0, 1,2, where the notation &; indicates that this point depends on x;. 


Three-Point Formulas 


The formulas from Eq. (4.3) become especially useful if the nodes are equally spaced, that 
is, when 


X}=Xo th and x.=x+2h, forsomeh £0. 


We will assume equally-spaced nodes throughout the remainder of this section. 
Using Eq. (4.3) with x; = x9,x1 = x9 +h, and x2 = xo + 2h gives 


tio bi) 3 , 1 He) 
f Go) =F — 5 F G0) + FG) — 5 FG) Pat (0). 


Doing the same for x; = x, gives 


: _ 1 1 1 hh @) 
foap= 7 |= 5 Ft) a 59) = ra (1), 
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and for x; = x2, 


ue Le , 3 03) 
FO)d=F5 75 Fo) — FGA) + 5 F G2) users (&2). 


Since x1 = x9 +h and x2 = xo + 2h, these formulas can also be expressed as 
; 1 3 1 h @) 
f (x0) = i ee 5 ae em) tat (&), 


1 h2 
fot) = 7/5/0045 Feo +21] =F — f&), 
and 
3 hr (3) 
f' (xo + 2h) = 5 fe) — 2f (0 +A) + 5 fx + 2A) | + > OE). 


As a matter of convenience, the variable substitution x for x9 + is used in the middle 
equation to change this formula to an approximation for f’(xo). A similar change, xo for 
xo + 2h, is used in the last equation. This gives three formulas for approximating f’ (x0): 


fo) = 1-3 f Go) + 4f Go + 2) — fo + 2h)] + * p&) 


1 
f' (0) = apt fo — h)+ f(x +h] - © ), 


and 


1 i? 
f'Q0) = sp LF Go — 2h) — 4F (xo h) +3f@ol+ > f°). 


Finally, note that the last of these equations can be obtained from the first by simply replacing 
h with —h, so there are actually only two formulas: 


Three-Point Endpoint Formula 

h2 
© f'(x0) = alt 3 f (x0) +4 fo +h) — fo + 2h)] + a7 (&), (4.4) 
where & 9 lies between xo and xo + 2h. 


Three-Point Midpoint Formula 


1 hh 
© f'@0) = SIF Go +h) — f(x — A] ra (4.5) 


where &, lies between x9 — h and xp + h. 

Although the errors in both Eq. (4.4) and Eq. (4.5) are O(h’), the error in Eq. (4.5) is 
approximately half the error in Eq. (4.4). This is because Eq. (4.5) uses data on both sides of 
xo and Eq. (4.4) uses data on only one side. Note also that f needs to be evaluated at only two 
points in Eq. (4.5), whereas in Eq. (4.4) three evaluations are needed. Figure 4.2 on page 178 
gives an illustration of the approximation produced from Eq. (4.5). The approximation in 
Eq. (4.4) is useful near the ends of an interval, because information about f outside the 
interval may not be available. 
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Figure 4.2 


Example 2 


Numerical Differentiation and Integration 


Slope f(x) 


1 : 
Slope Th [f(% + A) — fo 


Five-Point Formulas 


The methods presented in Eqs. (4.4) and (4.5) are called three-point formulas (even though 
the third point f (xo) does not appear in Eq. (4.5)). Similarly, there are five-point formulas 
that involve evaluating the function at two additional points. The error term for these for- 
mulas is O(h*). One common five-point formula is used to determine approximations for 
the derivative at the midpoint. 


Five-Point Midpoint Formula 


1 ht 
*  f' Go) = ppl t Go — 2h) — BF Go — A) + BF Go +h) — fo + 2h)] + al 
(4.6) 


where & lies between x9 — 2h and xp + 2h. 


The derivation of this formula is considered in Section 4.2. The other five-point formula is 
used for approximations at the endpoints. 


Five-Point Endpoint Formula 


1 
° f' (x0) = Top lO) + 48 f (xo + h) — 36 f (xo + 2h) 
h* 
+ 16 f (xo + 3h) — 3f (xo + 4h)] + ate (é), (4.7) 
where & lies between xg and x9 + 4h. 


Left-endpoint approximations are found using this formula with h > 0 and right-endpoint 
approximations with h < 0. The five-point endpoint formula is particularly useful for the 
clamped cubic spline interpolation of Section 3.5. 


Values for f (x) = xe” are given in Table 4.2. Use all the applicable three-point and five-point 
formulas to approximate f’(2.0). 
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Table 4.2 

x f(x) 
1.8 10.889365 
1.9 12.703199 
2.0 14.778112 
2.1 17.148957 
2.2 19.855030 
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Solution The data in the table permit us to find four different three-point approximations. 
We can use the endpoint formula (4.4) with h = 0.1 or with h = —0.1, and we can use the 
midpoint formula (4.5) with h = 0.1 or with h = 0.2. 

Using the endpoint formula (4.4) with h = 0.1 gives 

1 
993/20) +4f(2.1) — f(2.2] = 5[-—3(14.778112) + 4(17.148957) 
, — 19.855030)] = 22.032310, 

and with h = —0.1 gives 22.054525. 

Using the midpoint formula (4.5) with h = 0.1 gives 

1 
go — f(1.9)] = 5(17.148957 — 12.7703199) = 22.228790, 


and with h = 0.2 gives 22.414163. 
The only five-point formula for which the table gives sufficient data is the midpoint 
formula (4.6) with h = 0.1. This gives 


SUrd.8) — 8f (1.9) + 8f (2.1) — f(2.2)] = J {10.889365 — 8(12.703199) 


+ 8(17.148957) — 19.855030] 
= 22.166999 


If we had no other information we would accept the five-point midpoint approximation using 
h = 0.1 as the most accurate, and expect the true value to be between that approximation 
and the three-point mid-point approximation that is in the interval [22.166, 22.229]. 
The true value in this case is f’(2.0) = (2 + 1)e? = 22.167168, so the approximation 
errors are actually: 
Three-point endpoint with h = 0.1: 1.35 x 1071; 
Three-point endpoint with h = —0.1: 1.13 x 107}; 
Three-point midpoint with h = 0.1: —6.16 x 1077; 
Three-point midpoint with h = 0.2: —2.47 x 107}; 
Five-point midpoint with h = 0.1: 1.69 x 1074. a 
Methods can also be derived to find approximations to higher derivatives of a function 
using only tabulated values of the function at various points. The derivation is algebraically 
tedious, however, so only a representative procedure will be presented. 


Expand a function f ina third Taylor polynomial about a point xo and evaluate at x9 +h 
and xo — h. Then 


/ i ” 2 1 my 3 1 (4) 4 
FOo+h) = fGo)+ fCoht+ =f Cok + ef Goh’ + Fo Enh 
and 
! 1 " 2 1 m 3 1 (4) 4 
iGo- th) = fly — Pann t sf Gye ar On sar Ean, 


where x9 —h < €_1 < x9 < &| <xo +h. 
If we add these equations, the terms involving f’(x9) and — f’(xp) cancel, so 


_ " 2 i (4) (4) 4 
f Qo +h) + fo — h) = 2f (x0) + fo) h + malt Gory alk. 
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Example 3 

Table 4.3 

x f(x) 
1.8 10.889365 
1.9 12.703199 
2.0 14.778112 
2.1 17.148957 
2.2 19.855030 


Numerical Differentiation and Integration 


Solving this equation for f” (xo) gives 


” I We aca) 4) 
fo) = lt Go —h)—2f Qo) + fo + h)] — malt Gir fea. (4.8) 


Suppose f“ is continuous on [x9 — h,xo + A]. Since 51 f (E1) + f (E_1)] is between 
f 4 (€;) and f @) (€_,), the Intermediate Value Theorem implies that a number é exists 
between &, and &_,, and hence in (xp — h, x9 + A), with 


1 
fC@BH= 5 [fe + FO EDI. 


This permits us to rewrite Eq. (4.8) in its final form. 


Second Derivative Midpoint Formula 
1 h2 
. F' 0) = TLS @o — h) — 2f Go) + F% +A) — The) (4.9) 


for some &, where x) —h < € <xp +h. 
If f is continuous on [xy — h, x9 + h] it is also bounded, and the approximation is O(n’). 


In Example 2 we used the data shown in Table 4.3 to approximate the first derivative of 
f(x) = xe* at x = 2.0. Use the second derivative formula (4.9) to approximate f” (2.0). 


Solution The data permits us to determine two approximations for f” (2.0). Using (4.9) 
with h = 0.1 gives 


1 
Sor lf (l-9) = 2F(2.0) + f2-1)] = 100[12.703199 — 2(14.778112) + 17.148957] 
= 29.593200, 


and using (4.9) with h = 0.2 gives 


1 
ooglt 8) —2f (2.0) + f(2.2)] = 25[10.889365 — 2(14.778112) + 19.855030] 
= 29.704275. 


Because f”(x) = (x + 2)e*, the exact value is f”(2.0) = 29.556224. Hence the actual 
errors are —3.70 x 10-7 and —1.48 x 107', respectively. a 


Round-Off Error Instability 


It is particularly important to pay attention to round-off error when approximating deriva- 
tives. To illustrate the situation, let us examine the three-point midpoint formula Eq. (4.5), 


foe h h hf 
P00) = 5, Li eal fo — A)] rol (1), 


more closely. Suppose that in evaluating f (xo + 4) and f (xo — h) we encounter round-off 
errors e(%o + h) and e(xp — h). Then our computations actually use the values f (xo + A) 
and f (xo — h), which are related to the true values f(xp + A) and f (xo — h) by 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


4.1 Numerical Differentiation 181 


fo +h) = foth)+eo+h) and f(x —h) = f(x —h) + e(xo — h). 

The total error in the approximation, 
fo +h) — f(%o—h) — em +h)— eG —h) 2? 
2h 7 2h 6 


is due both to round-off error, the first part, and to truncation error. If we assume that the 
round-off errors e(xp + h) are bounded by some number € > 0 and that the third derivative 
of f is bounded by a number M > 0, then 


f' (xo) FP ERD, 


feo+h) = fao-) 2 
2h h 6 


f' (xo) M. 


To reduce the truncation error, h2M /6, we need to reduce h. But as h is reduced, the round- 
off error e/h grows. In practice, then, it is seldom advantageous to let h be too small, because 
in that case the round-off error will dominate the calculations. 


Illustration Consider using the values in Table 4.4 to approximate f’(0.900), where f(x) = sinx. The 
true value is cos 0.900 = 0.62161. The formula 
0.900 + h) — f (0.900 — h 
moaned! y= fe 
2h 
with different values of h, gives the approximations in Table 4.5. 
Table 4.4 eri ata Table 4.5 Approximation 
. h to f’(0.900) Error 
0.800 0.71736 0.901 0.78395 0.001 0.62500 0.00339 
0.850 0.75128 0.902 0.78457 
0.002 0.62250 0.00089 
0.880 0.77074 0.905 0.78643 
0.005 0.62200 0.00039 
0.890 0.77707 0.910 0.78950 
0.010 0.62150 —0.00011 
0.895 0.78021 0.920 0.79560 
0.020 0.62150 —0.00011 
0.898 0.78208 0.950 0.81342 
0.899 0.78270 1.000 0.84147 on sasha =e 
: : : : 0.100 0.62055 —0.00106 


The optimal choice for 4 appears to lie between 0.005 and 0.05. We can use calculus to 
verify (see Exercise 29) that a minimum for 


jee 
e(h) = —+ —M, 

h 6 
occurs at h = ¥/3e/M, where 


M= max |f”@|{= max _ |cosx| =cos0.8 ¥ 0.69671. 
x€[0.800, 1.00] x€[0.800, 1.00] 


Because values of f are given to five decimal places, we will assume that the round-off 
error is bounded by ¢ = 5 x 10°. Therefore, the optimal choice of h is approximately 


[3(0.000005) 
ie 2 ee ee, 
0.69671 


which is consistent with the results in Table 4.6. 
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In practice, we cannot compute an optimal / to use in approximating the derivative, since 
we have no knowledge of the third derivative of the function. But we must remain aware 
that reducing the step size will not always improve the approximation. 


We have considered only the round-off error problems that are presented by the three- 
point formula Eq. (4.5), but similar difficulties occur with all the differentiation formulas. 
The reason can be traced to the need to divide by a power of h. As we found in Section 1.2 
(see, in particular, Example 3), division by small numbers tends to exaggerate round-off 
error, and this operation should be avoided if possible. In the case of numerical differenti- 
ation, we cannot avoid the problem entirely, although the higher-order methods reduce the 


difficulty. 
Keep in mind that difference AS approximation methods, numerical differentiation is unstable, since the small values 
method approximations might be of A needed to reduce truncation error also cause the round-off error to grow. This is the first 
unstable. class of unstable methods we have encountered, and these techniques would be avoided if it 


were possible. However, in addition to being used for computational purposes, the formulas 
are needed for approximating the solutions of ordinary and partial-differential equations. 


EXERCISE SET 4.1 


1. Use the forward-difference formulas and backward-difference formulas to determine each missing 
entry in the following tables. 


a x | f@ | f@ bo ox | f@ | f® 
0.5 | 0.4794 0.0 | 0.00000 
0.6 | 0.5646 0.2 | 0.74140 
0.7 | 0.6442 0.4 | 1.3718 


2. Use the forward-difference formulas and backward-difference formulas to determine each missing 
entry in the following tables. 


ax | f@ | f@ b ox | f@® | fe 
—0.3 | 1.9507 1.0 | 1.0000 
—0.2 | 2.0421 1.2 | 1.2625 
—0.1 | 2.0601 1.4 | 1.6595 


3. The data in Exercise 1 were taken from the following functions. Compute the actual errors in Exer- 
cise 1, and find error bounds using the error formulas. 


a. f(x) =sinx b. f(x) =e" — 2x7 +3x-1 


4. The data in Exercise 2 were taken from the following functions. Compute the actual errors in Exer- 
cise 2, and find error bounds using the error formulas. 


a. f(x) =2cos2x —x b. f(x) =x? Inx+1 
5. Use the most accurate three-point formula to determine each missing entry in the following tables. 
a Xx SQ) | t'(x) b x f@) | f'@) 
1.1 9.025013 8.1 | 16.94410 
1.2 11.02318 8.3 | 17.56492 
1.3 13.46374 8.5 | 18.19056 
1.4 16.44465 8.7 | 18.82091 
Cc x f(x) | f'() dx f() | f'(@) 
2.9 | —4.827866 2.0 | 3.6887983 
3.0 | —4.240058 2.1 | 3.6905701 
3.1 | —3.496909 2.2 | 3.6688192 
3.2 | —2.596792 2.3 | 3.6245909 
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6. Use the most accurate three-point formula to determine each missing entry in the following tables. 


ax | f®™ | f@ 
—0.3 | —0.27652 
—0.2 | —0.25074 
—0.1 | —0.16134 
0 0 
c. x | f (x) | f'@) 
1.1 | 1.52918 
1.2 | 1.64024 
1.3 | 1.70470 
1.4 | 1.71277 


b. 


x | fe) | f'@ 

7.4 | —68.3193 

7.6 | —71.6982 

7.8 | —75.1576 

8.0 | —78.6974 

x | f®) | £@ 

—2.7 0.054797 

—2.5 0.11342 

—2.3 0.65536 

—2.1 0.98472 


7. The data in Exercise 5 were taken from the following functions. Compute the actual errors in Exer- 
cise 5, and find error bounds using the error formulas. 


a 6 f(w= e 


ec f(x) =xcosx—x 


2 


a. f(x) =e* —cos2x 
ce. f(x) =xsinx +x? cosx 


missing entry in the following tables. 


a. x 


2.1 
2:2, 
2.3 
2.4 
2.5 
2.6 


Ff) 


—1.709847 
—1.373823 
—1.119214 
—0.9160143 
—0.7470223 
—0.6015966 


sin x 


f'(@) 


b. 
d. 


b. 
d. 


f(x) =xInx 

fM= 2(n x)? + 3 sin x 

8. The data in Exercise 6 were taken from the following functions. Compute the actual errors in Exer- 
cise 6, and find error bounds using the error formulas. 


f(&) = In@ + 2) — @+ 1)? 
f (x) = (cos 3x)? — e” 


9. Use the formulas given in this section to determine, as accurately as possible, approximations for each 


f(x) 


f'@) 


9.367879 
8.233241 
7.180350 
6.209329 
5.320305 
4.513417 


10. Use the formulas given in this section to determine, as accurately as possible, approximations for each 


missing entry in the following tables. 


a x fx) f'@) 
1.05 | —1.709847 
1.10 | —1.373823 
1.15 | —1.119214 
1.20 | —0.9160143 
1.25 | —0.7470223 
1.30 | —0.6015966 


b. 


x f(x) f'@) 
—3.0 | 16.08554 
—2.8 | 12.64465 
—2.6 9.863738 
—2.4 7.623176 
—2.2 5.825013 
—2.0 4.389056 


11. The data in Exercise 9 were taken from the following functions. Compute the actual errors in Exer- 
cise 9, and find error bounds using the error formulas and Maple. 


a. f(x) =tanx 


error. 


Xx 


| 1 


b. 


b. 


| 3 


f(x) = 3 +x? 
12. The data in Exercise 10 were taken from the following functions. Compute the actual errors in Exer- 
cise 10, and find error bounds using the error formulas and Maple. 
a. f(x) =tan2x 
13. Use the following data and the knowledge that the first five derivatives of f are bounded on [1,5] by 
2, 3, 6, 12 and 23, respectively, to approximate f’(3) as accurately as possible. Find a bound for the 


f@) =e*—-14x 


4 | 5 


f(x) | 2.4142 | 2.6734 | 2.8974 | 3.0976 | 3.2804 
14. Repeat Exercise 13, assuming instead that the third derivative of f is bounded on [1,5] by 4. 
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15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 
24. 


25. 


26. 


Numerical Differentiation and Integration 


Repeat Exercise 1 using four-digit rounding arithmetic, and compare the errors to those in 
Exercise 3. 


Repeat Exercise 5 using four-digit chopping arithmetic, and compare the errors to those in 
Exercise 7. 


Repeat Exercise 9 using four-digit rounding arithmetic, and compare the errors to those in 
Exercise 11. 


Consider the following table of data: 


x 0.2 | 0.4 | 0.6 | 0.8 1.0 
f(x) | 0.9798652 | 0.9177710 | 0.808038 | 0.6386093 | 0.3843735 


a. Use all the appropriate formulas given in this section to approximate f’(0.4) and f”(0.4). 
b. Use all the appropriate formulas given in this section to approximate f’(0.6) and f”(0.6). 


Let f(x) = cos zx. Use Eq. (4.9) and the values of f(x) at x = 0.25, 0.5, and 0.75 to approximate 
f"(0.5). Compare this result to the exact value and to the approximation found in Exercise 15 of 
Section 3.5. Explain why this method is particularly accurate for this problem, and find a bound for 
the error. 


Let f(x) = 3xe* — cos x. Use the following data and Eq. (4.9) to approximate f”(1.3) with h = 0.1 
and with h = 0.01. 


x [120 | 129 | 130 Jat | 140 
f(x) | 1159006 | 13.78176 | 14.04276 | 14.30741 | 16.86187 


Compare your results to f”(1.3). 
Consider the following table of data: 


x | 02 | 0.4 0.6 | 0.8 | 1.0 
F(x) | 0.9798652 | 0.9177710 | 0.8080348 | 0.6386093 | 0.3843735 


a. Use Eq. (4.7) to approximate f’(0.2). 
b. Use Eq. (4.7) to approximate f’(1.0). 
c. Use Eq. (4.6) to approximate f’(0.6). 


Derive an O(h*) five-point formula to approximate f’(xo) that uses f(x) — h), f(x), f (to +h), 
F(% + 2h), and f (xo + 3A). [Hint: Consider the expression A f (%) — h) + Bf (x) +h) + Cfo + 
2h) + Df (xo + 3h). Expand in fourth Taylor polynomials, and choose A, B, C, and D appropriately.] 
Use the formula derived in Exercise 22 and the data of Exercise 21 to approximate f’(0.4) and f’(0.8). 


a. Analyze the round-off errors, as in Example 4, for the formula 


fo +h) — fo) 


f' (%) = h 


h " 
af Go). 


b. Find an optimal / > 0 for the function given in Example 2. 


In Exercise 10 of Section 3.4 data were given describing a car traveling on a straight road. That 
problem asked to predict the position and speed of the car when t = 10 s. Use the following times and 
positions to predict the speed at each time listed. 


Time 0 3 5 8 10 13 
Distance | 0 | 225 | 383 | 623 | 742 | 993 


In a circuit with impressed voltage €(t) and inductance L, Kirchhoff’s first law gives the relationship 


di 
th =L—4+Ri 
E(t) ao i, 
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where R is the resistance in the circuit and i is the current. Suppose we measure the current for several 
values of t and obtain: 


t | 1.00 | 1.01 | 1.02 | 1.03 | 1.0 
i | 3.10 | 3.12 | 3.14 | 3.18 | 3.24 


where ¢ is measured in seconds, i is in amperes, the inductance L is a constant 0.98 henries, and the 
resistance is 0.142 ohms. Approximate the voltage €(f) when t = 1.00, 1.01, 1.02, 1.03, and 1.04. 


All calculus students know that the derivative of a function f at x can be defined as 


. f(xth)— f@ 
i = 1 a as 
f'G) = lim A 
Choose your favorite function f, nonzero number x, and computer or calculator. Generate approxi- 
mations f/(x) to f’(x) by 


2 f(x+ 10) — f(x) 


Hy) Te 


> 


for n = 1,2,...,20, and describe what happens. 


Derive a method for approximating f”’ (xy) whose error term is of order h? by expanding the function 
f ina fourth Taylor polynomial about xp and evaluating at xo + h and xp + 2h. 


Consider the function 
2 


ey 
e(h) = —-+ —M, 
i 6 


where M is a bound for the third derivative of a function. Show that e(h) has a minimum at ./3e/M. 


| Sa 4.2 Richardson's Extrapolation 


Lewis Fry Richardson 
(1881-1953) was the first person 
to systematically apply 
mathematics to weather 
prediction while working in 
England for the Meteorological 
Office. As a conscientious 
objector during World War I, he 
wrote extensively about the 
economic futility of warfare, 
using systems of differential 
equations to model rational 
interactions between countries. 
The extrapolation technique that 
bears his name was the 
rediscovery of a technique with 
roots that are at least as old as 
Christiaan Hugyens 
(1629-1695), and possibly 
Archimedes (287-212 B.c.E.). 


Richardson’s extrapolation is used to generate high-accuracy results while using low- 
order formulas. Although the name attached to the method refers to a paper written by 
L. F. Richardson and J. A. Gaunt [RG] in 1927, the idea behind the technique is much older. 
An interesting article regarding the history and application of extrapolation can be found 
in [Joy]. 

Extrapolation can be applied whenever it is known that an approximation technique 
has an error term with a predictable form, one that depends on a parameter, usually the step 
size h. Suppose that for each number h ¥ 0 we have a formula Nj (h) that approximates an 
unknown constant M, and that the truncation error involved with the approximation has the 
form 


M —N,(h) = Kjh+ Koh + Kuh +--+, 


for some collection of (unknown) constants K,, K2, K3,.... 
The truncation error is O(h), so unless there was a large variation in magnitude among 
the constants K), Ko, K3,..., 


M—WN,(0.1) ¥0.1Kki, M—WN,(0.01) + 0.01LK,, 


and, in general, M — Ni (h) © Kjh. 

The object of extrapolation is to find an easy way to combine these rather inaccu- 
rate O(h) approximations in an appropriate way to produce formulas with a higher-order 
truncation error. 
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Suppose, for example, we can combine the N,(h) formulas to produce an O(h7) 
approximation formula, N2(h), for M with 


M — No(h) = Koh? + Kh +++, 
for some, again unknown, collection of constants R, R;, .... Then we would have 
M —N>(0.1) © 0.01K>, M —N>(0.01) © 0.0001RK>, 


and so on. If the constants K; and K> are roughly of the same magnitude, then the N2(h) 
approximations would be much better than the corresponding Nj (h) approximations. The 
extrapolation continues by combining the N2(h) approximations in a manner that produces 
formulas with O(h?) truncation error, and so on. 

To see specifically how we can generate the extrapolation formulas, consider the O(h) 
formula for approximating M 


M=N,(h) + Kjh4+ Koh? + Kah +---. (4.10) 
The formula is assumed to hold for all positive h, so we replace the parameter h by half its 
value. Then we have a second O(h) approximation formula 
2 3 


ian" \e eee (4.11) 
= Ni 5) i) 24 oe : : 


Subtracting Eq. (4.10) from twice Eq. (4.11) eliminates the term involving K, and gives 


2, 3 
M=N, (5) + [Ni (3) - mic] +(5 -1°) + K3 (> -1°) fied C419) 


Define 
pxt ) (5)+ iG)- 1¢ ) P 


Then Eq. (4.12) is an O(h?) approximation formula for M: 


LoELS 
2 4 


M = N>(h) Ip aces (4.13) 


In Example 1 of Section 4.1 we use the forward-difference method with h = 0.1 and 
h = 0.05 to find approximations to f’(1.8) for f(x) = In(x). Assume that this formula has 
truncation error O(h) and use extrapolation on these values to see if this results in a better 
approximation. 


Solution In Example | of Section 4.1 we found that 
with h = 0.1: f’(1.8) © 0.5406722, and with h = 0.05: f’(1.8) © 0.5479795. 
This implies that 
N,(0.1) = 0.5406722 and N,(0.05) = 0.5479795. 
Extrapolating these results gives the new approximation 
N2(0.1) = Ni (0.05) + (1 (0.05) — V1 (0.1)) = 0.5479795 + (0.5479795 — 0.5406722) 
= 0.555287. 


The h = 0.1 and h = 0.05 results were found to be accurate to within 1.5 x 10-? and 
7.7 x 1073, respectively. Because f’(1.8) = 1/1.8 = 0.5, the extrapolated value is accurate 
to within 2.7 x 1074. | 
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Extrapolation can be applied whenever the truncation error for a formula has the form 


m—1 
Yo Kin + OW"), 


j=l 


for a collection of constants K; and when a, < a2 < a3 <--: < a@,,. Many formulas used 
for extrapolation have truncation errors that contain only even powers of h, that is, have the 
form 


M=N,(h)4+ Kh? + Kyh* 4+ K3h°+---. (4.14) 


The extrapolation is much more effective than when all powers of h are present because the 
averaging process produces results with errors Oh’), O(h*), O(hS), ..., with essentially 
no increase in computation, over the results with errors, O(h), O(h?), O(n), .... 

Assume that approximation has the form of Eq. (4.14 ). Replacing h with h/2 gives the 
O(h*) approximation formula 


2 4 6 


wen (*\ee aR oe 
ae ta he. Ga 


Subtracting Eq. (4.14) from 4 times this equation eliminates the h? term, 


3M = |4N f Ni(h K. a ht) +K- : ne 
= iG) - (A) | + Ko a )+ 3 16 5 


Dividing this equation by 3 produces an O(h*) formula 


m=tlay, (")-w@]4 2 fas _% La :s 
ae 9 ates ; 3 \4 a 16 


Defining 
N2(h) = [av f N\(h) | =N Iw A Ni(h 
2(A) 3 (3)- 1(A) (5) +3 (5) (A) |, 


produces the approximation formula with truncation error O(h*): 


M = N2(h) — K: i K ais + (4.15) 
= N2 2q 376 : : 
Now replace h in Eq. (4.15) with h/2 to produce a second O(h*) formula 
h ht Sn° 
M=N) Ky K; 
2 64 1024 


Subtracting Eq. (4.15 ) from 16 times this equation eliminates the h* term and gives 


ism =| 16, (2 No(h) jg te a. 
= 2 2 2 3 64 : 


Dividing this equation by 15 produces the new O(h°) formula 


== 16N. : N2(h) ie & 
= 75 2\5 2 3 


We now have the O(h°) approximation formula 
N3(h) = : 16N. zs No(h)| = N. i + : N. ft N>(h) 
aia.) ao ae ae IS sais 
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Continuing this procedure gives, for each j = 2,3,..., the O(h7/) approximation 


5) as Nj-1(A/2) — Nj) 


N= Nes (5 4-1] 
Table 4.6 shows the order in which the approximations are generated when 
M =N,(h) + Kih? + Koh* + K3h® +---. (4.16) 


It is conservatively assumed that the true result is accurate at least to within the agreement 
of the bottom two results in the diagonal, in this case, to within |N3(1) — Na(h)|. 


Table 4.6 O(h?) O(h*) O(h’) O(h’) 
1: N,(A) 
2: Ni(4) 3: N(h) 
4: N,(4) 5: N>(4) 6: N3(h) 
7: Ny (4) 8: N,(4) 9: N3(4) 10: N,(h) 


Example 2 Taylor’s theorem can be used to show that centered-difference formula in Eq. (4.5) to 
approximate f’ (xo) can be expressed with an error formula: 


/ = 1 h h he my ht (5) 
f Go) = LF Go +h) — fo — W)) gf 0) — tao Qo) —-+-. 


Find approximations of order O(h?), O(h*), and O(h°) for f'(2.0) when f(x) = xe* and 
h=0.2. 


Solution The constants K; = — f’"(xo)/6, Ko = —f © (xo)/120,---, are not likely to be 
known, but this is not important. We only need to know that these constants exist in order 
to apply extrapolation. 

We have the O(h”) approximation 


h hos 
' = Ni(h “ sree, 4.17 
f Go) = MA) 6 f° Go) Dot (Xo) (4.17) 
where 
1 
Nih) = SF Go + h) — Fo — YI. 
This gives us the first O(h”) approximations 


N, (0.2) = [f (2.2) — f(1.8)] = 2.5(19.855030 — 10.889365) = 22.414160, 


1 
0.4 
and 

1 
N,(0.1) = oof 2D — f(.9)] = 5(17.148957 — 12.703199) = 22.228786. 
Combining these to produce the first O(h*) approximation gives 


1 
N2(0.2) = Ni (0.1) + 3M (0.1) — N,(0.2)) 


1 
= 22.228786 + geen ee — 22.414160) = 22.166995. 
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To determine an O(h°) formula we need another O(h*) result, which requires us to find the 
third O(h7) approximation 
1 
N, (0.05) = ory 2.0) — f(1.95)] = 10(15.924197 — 13.705941) = 22.182564. 
We can now find the O(h*) approximation 


N2(0.1) = Ni (0.05) + 5M (0.05) — N;(0.1)) 


1 
= 22.182564 + Giana — 22.228786) = 22.167157. 
and finally the O(h°) approximation 
1 
N3(0.2) = N2(0.1) + 75 N2(0.1) — N,(0.2)) 


1 
= 22.167157 + ig — 22.166995) = 22.167168. 


We would expect the final approximation to be accurate to at least the value 22.167 because 
the N2(0.2) and N3(0.2) give this same value. In fact, N3(0.2) is accurate to all the listed 
digits. a 


Each column beyond the first in the extrapolation table is obtained by a simple av- 
eraging process, so the technique can produce high-order approximations with minimal 
computational cost. However, as k increases, the round-off error in N, (h/2*) will generally 
increase because the instability of numerical differentiation is related to the step size h/2*. 
Also, the higher-order formulas depend increasingly on the entry to their immediate left in 
the table, which is the reason we recommend comparing the final diagonal entries to ensure 
accuracy. 

In Section 4.1, we discussed both three- and five-point methods for approximating 
J’ (%o) given various functional values of f. The three-point methods were derived by 
differentiating a Lagrange interpolating polynomial for f. The five-point methods can be 
obtained in a similar manner, but the derivation is tedious. Extrapolation can be used to 
more easily derive these formulas, as illustrated below. 


Illustration Suppose we expand the function f in a fourth Taylor polynomial about xo. Then 


1 1 
F(x) =f 0) + f' Go) = x0) + 5 fo) (x — xo) + a (xo) (x — x0) 
1 1 
+ a (xo) (x — x)* + inf Oe — x9), 


for some number & between x and xo. Evaluating f at x) + A and xp — h gives 


1 1 
fo +A) =f 0) + foh + 5 Ff" oh? - gf" ao)h? 
1 1 
£4) 4 (5) 5 
+ Gf GoH + HE FO EDA (4.18) 


and 


1 1 
f (xo —h) =f (x0) — f'(xo)h + sf" Goh - gf oh? 
1 1 
— £4) 4 (5) 3 
+ sf Groh — Te FOE, (4.19) 


where x9 —h < & < x9 < & <x +A. 
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Subtracting Eq. (4.19) from Eq. (4.18) gives a new approximation for f’(x). 


he be 
Fo +h) — fo — h) = 2hf' 0) + =F" Go) + pot + f&)], (4.20) 


which implies that 


ht 
240 


! 1 he mn (5) (5) 
f' G0) = F LF Go + h) fo — h)] ef (x0) Ler eal ean 


If f ©) is continuous on [xo — h, xo + A], the Intermediate Value Theorem 1.11 implies that 
a number é in (%9 — hh, x9 + A) exists with 


ae 1 
fO@ = 5 [FOE + FO &)]. 


As a consequence,we have the O(h*) approximation 


ee h h ht em WOE 4.21 
Fo) = LF Go +h) fo — A)] ral (x0) 1207 (é). (4.21) 


Although the approximation in Eq. (4.21) is the same as that given in the three-point for- 
mula in Eq. (4.5), the unknown evaluation point occurs now in f©), rather than in f’”. 
Extrapolation takes advantage of this by first replacing 4 in Eq. (4.21) with 2h to give the 
new formula 


ian 2h 2h At oy 16h" erg 4.22 
FG) = FLL + )— f@o )I 6S Go) — 59 F (§); (4.22) 


where é is between x9 — 2h and xo + 2h. 


Multiplying Eq. (4.21) by 4 and subtracting Eq. (4.22) produces 


2 1 
3 f'(X0) = pt ot = feo Bl — fo Pek) — fo 28) 
ie > 2n* p 
re) at 6) 
307 (§) + 5 fr"). 


Even if f ©) ig continuous on [xo — 2h,x9 + 2h], the Intermediate Value Theorem 1.11 
cannot be applied as we did to derive Eq. (4.21) because here we have the difference of 
terms involving f©. However, an alternative method can be used to show that f© (&) and 
f © €) can still be replaced by a common value f)(£). Assuming this and dividing by 3 
produces the five-point midpoint formula Eq. (4.6) that we saw in Section 4.1 


1 ht 
f'o) = ppt Go — 28) — BF Go — A) + BF Go +h) — Fo + 2h) + agi Gh 


Other formulas for first and higher derivatives can be derived in a similar manner. See, 
for example, Exercise 8. 

The technique of extrapolation is used throughout the text. The most prominent appli- 
cations occur in approximating integrals in Section 4.5 and for determining approximate 
solutions to differential equations in Section 5.8. 
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EXERCISE SET 42 


1. Apply the extrapolation process described in Example | to determine N3(/), an approximation to 
J’ (xo), for the following functions and stepsizes. 


a. f(x) =Inx,x% = 1.0,h=04 c.f (x) = 2* sinx, x) = 1.05,h=0.4 
b f(@)=x+e,x =0.0,h=04 d. f(x) =x cosx, x) =2.3,h=04 


Add another line to the extrapolation table in Exercise | to obtain the approximation N,4(h). 


Repeat Exercise 1 using four-digit rounding arithmetic. 


Repeat Exercise 2 using four-digit rounding arithmetic. 


wr YN 


The following data give approximations to the integral 


m= sin x dx. 
0 


h h h 
N\(h) = 1.570796, N, 6 = 1.896119, MN, (7) = 1.974232, N, (5) = 1.993570. 


Assuming M = N,(h) + K,h? + Kyh* + K3h®° + Kyh® + Oh"), construct an extrapolation table to 
determine N4(h). 


6. The following data can be used to approximate the integral 


32/2 
M= / cos x dx. 
0 


h 
N,(h) = 2.356194, N @ = —0.4879837, 


h h 
N (7) = —0.8815732, N; (5) = —0.9709157. 


Assume a formula exists of the type given in Exercise 5 and determine N4(h). 


7. Show that the five-point formula in Eq. (4.6) applied to f(x) = xe* at x9 = 2.0 gives N2(0.2) in Table 
4.6 when h = 0.1 and N3(0.1) when h = 0.05. 


8. The forward-difference formula can be expressed as 


2 


1 h h 
f' Qo) = pif Go +h) — fo) 5 F" Co) 6 fo) + OCF"). 


Use extrapolation to derive an O(h*) formula for f’ (xo). 
9. Suppose that N(h) is an approximation to M for every h > 0 and that 


M=N(h)+ Kih+ Koh’ + K3h? +---, 


for some constants K,, K>, K3, .... Use the values N(h), N (4), and N (4) to produce an O(h*) 
approximation to M. 


10. Suppose that N(/) is an approximation to M for every h > 0 and that 
M=N(h)+ Kh? + Kyh* + Kho +---, 


for some constants K;, K>, K3, .... Use the values N(h), N (4), and N (4) to produce an O(h’) 
approximation to M. 


11. Incalculus, we learn that e = lim, ,9(1 + /)!/". 


a. Determine approximations to e corresponding to h = 0.04, 0.02, and 0.01. 


b. Use extrapolation on the approximations, assuming that constants K,, Ky, ... exist with 
e=(14+A)'"4+ Kih+ Koh? + Kz? +---, to produce an O(h?) approximation to e, where 
h = 0.04. 


c. Do you think that the assumption in part (b) is correct? 
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12. 


13. 


14. 


15. 


Numerical Differentiation and Integration 


a. Show that 


; 2+h 1/h 
lim =e. 
h>0\2—-h 


b. Compute approximations to e using the formula N(h) = (#*) _ for h = 0.04, 0.02, and 0.01. 


Assume that e = N(h) + Kjh+ Kyh? + K3h> +--+. Use extrapolation, with at least 16 digits of 
precision, to compute an O(h*) approximation to e with h = 0.04. Do you think the assumption 
is correct? 


d. Show that N(—h) = N(h). 
e. Use part (d) to show that K, = K3 = Ks = --- = 0 in the formula 


e=N(h)+ Kih+ Koh? + Kuh’ Kyh' + Ks +--+, 
so that the formula reduces to 
e=N(h)+ Koh? + Kght + Koh +---. 


f. Use the results of part (e) and extrapolation to compute an O(h°) approximation to e with 
h = 0.04. 


Suppose the following extrapolation table has been constructed to approximate the number M with 
M= N,(h) + Kh’ + Kyh* + K3h°: 


Nj (h) 


N (5) N>(h) 
2 

MN (7) Ny (3) N3(h) 
4 2 


a. Show that the linear interpolating polynomial Po, (4) through (h?,.N,(h)) and (h2 /4,Ni(h/2)) 
satisfies Po, (0) = N2(h). Similarly, show that P; .(0) = N2(h/2). 


b. Show that the linear interpolating polynomial Po >(h) through (h*+, Nz(h)) and (h4/16, No(h/2)) 
satisfies Po (0) = N3(h). 


Suppose that Nj (h) is a formula that produces O(h) approximations to a number M and that 


M=N\(h)+Kih+ Koh’ +---, 


for a collection of positive constants K,, Ky,.... Then N,(h), Ni (4/2), Ni (h/4), ... are all lower 
bounds for M@. What can be said about the extrapolated approximations N2(h), N3(h),...? 


The semiperimeters of regular polygons with k sides that inscribe and circumscribe the unit circle 
were used by Archimedes before 200 B.c.£. to approximate z, the circumference of a semicircle. 
Geometry can be used to show that the sequence of inscribed and circumscribed semiperimeters {p;} 
and {P;}, respectively, satisfy 


Py = kKsin (=) and P, =ktan (=) : 
k k 


with py < a < Px, whenever k > 4. 
a. Show that py = 2/2 and Py = 4. 


b. Show that for k > 4, the sequences satisfy the recurrence relations 


and Pa = V PrP. 


2pxrPr 


ae 
Det Pr 


c. Approximate z to within 10-* by computing p;, and P; until Py — py < 107+. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


4.3 Elements of Numerical Integration 193 


d. Use Taylor Series to show that 


and 


e. Use extrapolation with h = 1/k to better approximate zr. 


| a 4.3 Elements of Numerical Integration 


The need often arises for evaluating the definite integral of a function that has no explicit 
antiderivative or whose antiderivative is not easy to obtain. The basic method involved in 


approximating ri f (x) dx is called numerical quadrature. It uses a sum )~"_, a; f (x) to 


approximate i f (x) dx. 

The methods of quadrature in this section are based on the interpolation polynomials 
given in Chapter 3. The basic idea is to select a set of distinct nodes {xo,...,x,} from the 
interval [a, b]. Then integrate the Lagrange interpolating polynomial 


P,(x) = > fOa)Lia) 


i=0 


and its truncation error term over [a, b] to obtain 


rm bon bin (n+l) 
_ 7 _ fit Goo) 
i Oiie= | Do Fp) de + / He-2 Gap * 
bon 


= : : : to — yx.) far) 
= Dafat | [Tor—mrerer de, 


i=0 


where &(x) is in [a, b] for each x and 
b 
q = / L(x) dx, foreachi=0,1,...,n. 


The quadrature formula is, therefore, 


b n 
/ f(x) dx © Y "aif (i), 
a i=0 


with error given by 
1 os 
= _ yx.) far) 
E(f) = anal He xi) fr" (E(x) dx. 


Before discussing the general situation of quadrature formulas, let us consider formulas 
produced by using first and second Lagrange polynomials with equally-spaced nodes. This 
gives the Trapezoidal rule and Simpson’s rule, which are commonly introduced in calculus 
courses. 
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When we use the term trapezoid 
we mean a four-sided figure that 
has at least two of its sides 
parallel. The European term for 
this figure is trapezium. To further 
confuse the issue, the European 
word trapezoidal refers to a 
four-sided figure with no sides 
equal, and the American word for 
this type of figure is trapezium. 


Figure 4.3 


Numerical Differentiation and Integration 


The Trapezoidal Rule 


To derive the Trapezoidal rule for approximating l? F(x) dx, letxyp = a,x) =b,h=b—a 
and use the linear Lagrange ae 


roo e—™ SFG gi a FG 
(xo — (x1 — 
Then 
b 

[toas fe - Sa) 5 6 oF | a 

a x0 ( Xo — (x) 
1p 

+5 1 fF" (E@))(& — xo) (x — x1) dx. (4.23) 
xo 


The product (x — x9) (x — x;) does not change sign on [xo, x; ], so the Weighted Mean Value 
Theorem for Integrals 1.13 can be applied to the error term to give, for some & in (x9, X)), 


xy x] 
| f" (EQ) (& — x0) (% — x1) dx = re | (x — x0) (x — x1) dx 
xo x0 


nal 


(x1 + x0) a 


7 
="@)|5 - . + ror 


x0 


3 
—_ gn 
aaa? ff ). 
Consequently, Eq. (4.23) implies that 


2 x) (x — x0) es 
[ f(x) dx = Ease + Met] = pt (€) 


= 3 
= M7) F0) +fel- GP. 


Using the notation h = x; — xo gives the following rule: 


Trapezoidal Rule: 


b h 3 
/ fQ@) dx = al Fo) + f(xi)] - pi © 


This is called the Trapezoidal rule because when f is a function with positive values, 
ie J (x) dx is approximated by the area in a trapezoid, as shown in Figure 4.3. 
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The error term for the Trapezoidal rule involves f”, so the rule gives the exact 
result when applied to any function whose second derivative is identically zero, that is, any 
polynomial of degree one or less. 


Simpson’s Rule 
Simpson’s rule results from integrating over [a,b] the second Lagrange polynomial with 


equally-spaced nodes x) = a, x2 = b, and x} = a+h, where h = (b — a)/2. (See 
Figure 4.4.) 


Figure 4.4 


Therefore 


b x 
/ foyar= [| (* — x1) @ — %2) fo) + (x — x0) (% — x2) Fon) 


xo Lo — «1)(%0 — x2) (x1 — x9) (x1 — X2) 


(X= Xo) — x 


) foa)| dx 


(x2 — Xo) (x2 — x1) 


7 is (X= Xo) — x1) (% — x2) 


f° E@)) de. 


x0 


Deriving Simpson’s rule in this manner, however, provides only an O(h*) error term involv- 
ing f®. By approaching the problem in another way, a higher-order term involving f 
can be derived. 

To illustrate this alternative method, suppose that f is expanded in the third Taylor 
polynomial about x,. Then for each x in [x9, x2], a number & (x) in (xo, x2) exists with 


” m (4) 
f@) = fat fave to ny+t So eee eee x)4 
and 
[F teoar= | roves +e xp? + PG — ay? 
xo 


FUG) 
ay 


x2 1 Bi) 
Yin ot] aay / FOEC)@ — x1)" dx. (4.24) 
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Thomas Simpson (1710-1761) 
was a self-taught mathematician 
who supported himself during his 
early years as a weaver. His 
primary interest was probability 
theory, although in 1750 he 
published a two-volume calculus 
book entitled The Doctrine and 
Application of Fluxions. 


Example 1 


Numerical Differentiation and Integration 


Because (x — x,)* is never negative on [xo, x2], the Weighted Mean Value Theorem for 
Integrals 1.13 implies that 


= ie FOR) £” fOE€E) be 
(4) _ 4 = _ 4 _ _ 5 
4 [ FOE) — 41)" dx = a [ (x — x1)" dx = 120 (x | 


x0 


for some number €, in (9, x2). 
However, h = x2 — x; = x] — Xo, SO 


(2 — x1)? — (to — 41)” = 02 — 21)" — Go — m1)* = 0, 
whereas 
(2 — a1)? = Go — 41)? = 2h? and (xy — 41)" — @ — m1)? = 24”. 
Consequently, Eq. (4.24) can be rewritten as 


SOR) 
60 


x 3 
[te de = 2hflnr) + Lf") + h. 
x0 


If we now replace f”(x,) by the approximation given in Eq. (4.9) of Section 4.1, we 
have 


x2 Wd 2 re 
Lene ah Sener + {plea —2F (a) + feo)l - Bree + OSs 
= “070i )+4f Qa) + F021 - e 1 we — 1 ae ) 
= 3 0 1 2 12 |3 2 5 1 . 


It can be shown by alternative methods (see Exercise 24) that the values & and & in this 
expression can be replaced by a common value & in (xo, x2). This gives Simpson’s rule. 


Simpson’s Rule: 


x2 h ho " 
/ F(x) dx = S1f (0) + 4f G1) + fC2)] - a (é). 
x0 


The error term in Simpson’s rule involves the fourth derivative of f, so it gives exact 
results when applied to any polynomial of degree three or less. 


2 
Compare the Trapezoidal rule and Simpson’s rule approximations to i F (x) dx when f (x) 
0 
is 
(a) x? (b) x* (©) @+)7 
(dq) V1+x2 (e) sinx (ff) e& 


Solution On [0,2] the Trapezoidal and Simpson’s rule have the forms 
2 
Trapezoid: i f(x) dx f()+ f@) and 
0 


[f() +4f0) + f)]. 


ole 


2 
Simpson’s: / fx) dx ® 
0 
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Table 4.7 


Definition 4.1 


The improved accuracy of 
Simpson’s rule over the 
Trapezoidal rule is intuitively 
explained by the fact that 
Simpson’s rule includes a 
midpoint evaluation that provides 
better balance to the 
approximation. 


The open and closed terminology 
for methods implies that the open 
methods use as nodes only points 
in the open interval, (a, b) to 
approximate f (x) dx. The 
closed methods include the points 
a and b of the closed interval 

[a, b] as nodes. 
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When f(x) = x? they give 
2 
Trapezoid: / f@ax~0?+2?=4 and 
0 
: 7 ee 2492 8 
Simpson’s: fx) dx gird Teles 
0 


The approximation from Simpson’s rule is exact because its truncation error involves f, 
which is identically 0 when f(x) = x’. 
The results to three places for the functions are summarized in Table 4.7. Notice that 


in each instance Simpson’s Rule is significantly superior. a 
(a) (b) (c) (d) (e) (f) 

f@) x? x4 (x+1)7! V14+x2 sinx e 

Exact value 2.667 6.400 1.099 2.958 1.416 6.389 

Trapezoidal 4.000 16.000 1.333 3.326 0.909 8.389 

Simpson’s 2.667 6.667 1.111 2.964 1.425 6.421 


Measuring Precision 


The standard derivation of quadrature error formulas is based on determining the class of 
polynomials for which these formulas produce exact results. The next definition is used to 
facilitate the discussion of this derivation. 


The degree of accuracy, or precision, of a quadrature formula is the largest positive integer 
n such that the formula is exact for x*, for each k = 0,1,...,n. i 


Definition 4.1 implies that the Trapezoidal and Simpson’s rules have degrees of preci- 
sion one and three, respectively. 
Integration and summation are linear operations; that is, 


b b b 
[ eres + pec ae=a f fea) de +B f g(x) dx 


and 


n 


Yas Gi) + Be) =a D> fi) +B > ge, 
i=0 i=0 


i=0 


for each pair of integrable functions f and g and each pair of real constants a and 6. This 
implies (see Exercise 25) that: 


e The degree of precision of a quadrature formula is n if and only if the error is zero for 
all polynomials of degree k = 0,1,...,n, but is not zero for some polynomial of degree 
n+l. 


The Trapezoidal and Simpson’s rules are examples of a class of methods known as Newton- 
Cotes formulas. There are two types of Newton-Cotes formulas, open and closed. 
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Figure 4.5 


Theorem 4.2 


Roger Cotes (1682-1716) rose 
from a modest background to 
become, in 1704, the first 
Plumian Professor at Cambridge 
University. He made advances in 
numerous mathematical areas 
including numerical methods for 
interpolation and integration. 
Newton is reputed to have said of 
Cotes ...ifhe had lived we might 
have known something. 


Numerical Differentiation and Integration 


Closed Newton-Cotes Formulas 


The (n+ 1)-point closed Newton-Cotes formula uses nodes x; = x) + ih, fori = 0,1,...,n, 


where x9 = a, X, = bandh = (b — a)/n. (See Figure 4.5.) It is called closed because the 
endpoints of the closed interval [a, b] are included as nodes. 


The formula assumes the form 


b n 
/ f@dx~ Safa), 
? i=0 


where 
a= | L(x) dx = f (2 
xo x0 j=0 (x; ~~ xj) 
j#i 


The following theorem details the error analysis associated with the closed Newton- 
Cotes formulas. For a proof of this theorem, see [IK], p. 313. 


Suppose that a a; f (x;) denotes the (n + 1)-point closed Newton-Cotes formula with 
Xo = a, X, = b, and h = (b — a)/n. There exists € € (a, b) for which 


i f(s) dx = Dases +P ® [rep -ma. 
if n is even and f € C"+?[a, b], and 
[ f() dx = Dass) a ee t(t—1)---(t—n) dt, 
if nis odd and f € C"t![a, b]. a 
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Note that when n is an even integer, the degree of precision is n + 1, although the 
interpolation polynomial is of degree at most n. When n is odd, the degree of precision is 
only n. 

Some of the common closed Newton-Cotes formulas with their error terms are listed. 
Note that in each case the unknown value & lies in (a, b). 


n = 1: Trapezoidal rule 


x1 h 3 
/ FQ) dx = af Go) + f(xi)] — pf &: where x9 <& < 4X1. (4.25) 
x0 


n = 2: Simpson’s rule 

” a Wa) 

i f(x) dx = 3 Lf Go) + 4f@1) + f@2)] - 907 (€), where x9 <€ < Xp. 
x0 
(4.26) 
n = 3: Simpson’s Three-Eighths rule 
% 3h Bhs 
i f(x) dx = g LF 0) + 3f (1) + 3f@2) + f3)] - 07 (), (4.27) 
x0 


where Xo < & < x3. 


a) 
Il 
rs 


*4 2h 8h! 
/ f(x) dx = a5! f Go) + 32f (x1) + 12f 2) + 32f 3) + 7 Ff 4)] -— sagt 


x0 


where xX) <& < X4. (4.28) 


Open Newton-Cotes Formulas 


The open Newton-Cotes formulas do not include the endpoints of [a, b] as nodes. They use 
the nodes x; = x9 + ih, for eachi = 0,1,...,n, where h = (b—a)/(n+ 2) andx =a-+Ah. 
This implies that x, = b — h, so we label the endpoints by setting x_; = a and x,41 = b, 
as shown in Figure 4.6 on page 200. Open formulas contain all the nodes used for the 
approximation within the open interval (a, b). The formulas become 


n 


b An+1 
/ fo) dx = i; f@) dex aif i), 


*-1 i=0 


where 


b 
a= | L(x) dx. 
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Figure 4.6 


The following theorem is analogous to Theorem 4.2; its proof is contained in [IK], 
p. 314. 


Theorem 4.3 Suppose that )~"_, a; f (xi) denotes the (n + 1)-point open Newton-Cotes formula with 
X1 = 4, X41 = b, andh = (b — a)/(n+ 2). There exists € € (a,b) for which 


pyrt3 fe) (é) 


2 
4d)! _ t(t—1)---(@—n) dt, 


[ f(x) dx = wee, + 


i=0 
if nis even and f € C”*?[a, b], and 


pers por) (é) n+1 


[ foe= Dares ea t(t —1)---(t—n) dt, 


-1 


if nis odd and f € C"t![a, b]. a 


Notice, as in the case of the closed methods, we have the degree of precision compar- 
atively higher for the even methods than for the odd methods. 

Some of the common open Newton-Cotes formulas with their error terms are as 
follows: 


n = 0: Midpoint rule 
x] 3 
/ f (x) dx = 2hf (xo) + S f"(€), where x_) <& <x. (4.29) 
Xai 
n=1: 


2 3h 313, 
/ f(x) dx = Sf 0) + fal+ SG), where x1 <E<m. (4.30) 
x1 
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* 4h 1415 ay 
/ FQ) dx = 2 Fo) — fo) + 27 @)1+ GFE), (4.31) 
X-1 


where x_1 <& < x3. 
n=3: 
4 5h 95s (4) 
i f(x) dx = ag ll Fo) + fi) + f@2) + 11 fs) + aa” Figea Ce) (4.32) 
X-] 
where x_| <& <X4. 


Example 2 Compare the results of the closed and open Newton-Cotes formulas listed as (4.25)-(4.28) 
and (4.29)-(4.32) when approximating 


m/4 
i. sinx dx = 1 — J2/2 & 0.29289322. 
0 


Solution For the closed formulas we have 


4 
ead ar ) [sino + sin =| ~ 0.27768018 
re a 8) [sino + 4sin 7 + sin =| ~ 0.29293264 
3(r/12 
a a ) [sino 4 3sin a +3sin = + sin a ~ (29291070 
(1/16 : 
waa Sa ) 7 sin0 + 32sin 7 4+ 12sin - 4+ 32sin + Tsin 4 ~ 0.29289318 


and for the open formulas we have 


a 
n=0: 2/8) [sin | ~ 0.30055887 


—_ [sin + sin =] ~ 0.29798754 
— sin sin =~ 0. 
n 2 12 6 
4(7/16) _ ou _ a . 3x 
= 2% 2 2 sin — | © 0.29285866 
n 3 sin 16 sin 8 + 2sin 16 
EP) li sin = + sin = + sin = + 11 sin = | ~ 0.29286923 
a sin sin sin sin = | ~ 0. 
" 24 20 10 20 5 
Table 4.8 summarizes these results and shows the approximation errors. a 
Table 4.8 n 0 1 2 3 4 
Closed formulas 0.27768018 0.29293264 0.29291070 0.29289318 
Error 0.01521303 0.00003942 0.00001748 0.00000004 
Open formulas 0.30055887 0.29798754 0.29285866 0.29286923 
Error 0.00766565 0.00509432 0.00003456 0.00002399 
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EXERCISE SET 43 


1. 


~~ SP ANA we & 


10. 
11. 
12. 
13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


Approximate the following integrals using the Trapezoidal rule. 


1 05 9 
a. 7 x! dx b. i, dx 
0.5 0 x-4 
1.5 1 
/ x’ Inx dx d. / xe dx 
1 0 


c. 
16 0.35 
2. 2 
e. / es dx f. / ——_ dx 
1 x — 4 0 x2—4 
m/4 m/4 
g. iy x sin x dx h. / e** sin 2x dx 
0 0 
Approximate the following integrals using the Trapezoidal rule. 
0.25 0 
a. i (cosx)? dx b. / xIn(x + 1) dx 
0.25 ~0.5 
1.3 e+l 1 
c. . ((sinx)? — 2x sinx + 1) dx d. / dx 
0.75 e  <XxInx 


Find a bound for the error in Exercise 1 using the error formula, and compare this to the actual error. 
Find a bound for the error in Exercise 2 using the error formula, and compare this to the actual error. 
Repeat Exercise | using Simpson’s rule. 

Repeat Exercise 2 using Simpson’s rule. 

Repeat Exercise 3 using Simpson’s rule and the results of Exercise 5. 

Repeat Exercise 4 using Simpson’s rule and the results of Exercise 6. 

Repeat Exercise 1 using the Midpoint rule. 

Repeat Exercise 2 using the Midpoint rule. 

Repeat Exercise 3 using the Midpoint rule and the results of Exercise 9. 

Repeat Exercise 4 using the Midpoint rule and the results of Exercise 10. 

The Trapezoidal rule applied to i, J (x) dx gives the value 4, and Simpson’s rule gives the value 2. 
What is f(1)? 

The Trapezoidal rule applied to /, i J (x) dx gives the value 5, and the Midpoint rule gives the value 4. 
What value does Simpson’s rule give? 

Find the degree of precision of the quadrature formula 


[isoa=s(-B)+/(¥). 


Let h = (b — a)/3, x9 = a, x) = a+h, and x. = b. Find the degree of precision of the quadrature 
formula 


y 9 3 
/ f0) de = Fh fla) + Thf ea). 


The quadrature formula hie f@) dx = of (-1) +c, fO) + co f (1) is exact for all polynomials of 
degree less than or equal to 2. Determine co, c;, and cp. 


The quadrature formula i F(x) dx = cof 0) +c, fC) + cof (2) is exact for all polynomials of 
degree less than or equal to 2. Determine co, c1, and co. 


Find the constants co, c;, and x; so that the quadrature formula 


1 
/ f(x) dx = co f (0) + c1 f 1) 
0 


has the highest possible degree of precision. 
Find the constants xo, x, and c; so that the quadrature formula 


1 
1 
7 f(x) dx = 7 FO) + ci f (1) 
0 


has the highest possible degree of precision. 
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21. Approximate the following integrals using formulas (4.25) through (4.32). Are the accuracies of 
the approximations consistent with the error formulas? Which of parts (d) and (e) give the better 


approximation? 
0.1 x/2 
a. V1+x dx b. i (sin.x)? dx 
0 0 


15 10 4 

c. / e dx d. ik — dx 
Ll 1 * 
5.5 1 10 1 1 

e. | —dx+ [ — dx f. / xB dx 
1 x 55 % 0 


22. Given the function f at the following values, 
x | 1.8 | 2.0 2.2 | 2.4 | 2.6 


6.04241 | 8.03014 | 10.46675 


fox) | 3.12014 | 4.42569 


approximate . Ff (x) dx using all the appropriate quadrature formulas of this section. 


23. Suppose that the data of Exercise 22 have round-off errors given by the following table. 


x | 1:8 | 2.0 | 2.2 | 2.4 | 2.6 
Error in f(x) | 2x 10-6 | -2x 10-6 | -0.9 x 10-6 | -0.9 x 10-6 | 2x 10-8 


Calculate the errors due to round-off in Exercise 22. 


24. Derive Simpson’s rule with error term by using 


7 f(x) dx = ag f (xo) + a1 f (%1) + a f Oo) +k FO). 


Find ao, a, and a2 from the fact that Simpson’s rule is exact for f(x) = x" when n = 1,2, and 3. 
Then find k by applying the integration formula with f(x) = x*. 

25. Prove the statement following Definition 4.1; that is, show that a quadrature formula has degree of 
precision n if and only if the error E(P(x)) = 0 for all polynomials P(x) of degree k = 0,1,...,n, 
but E(P(x)) 4 0 for some polynomial P(x) of degree n + 1. 


26. Derive Simpson’s three-eighths rule (the closed rule with n = 3) with error term by using 
Theorem 4.2. 


27. Derive the open rule with n = 1 with error term by using Theorem 4.3. 


| 4.4 Composite Numerical Integration 


The Newton-Cotes formulas are generally unsuitable for use over large integration inter- 
vals. High-degree formulas would be required, and the values of the coefficients in these 
formulas are difficult to obtain. Also, the Newton-Cotes formulas are based on interpola- 
tory polynomials that use equally-spaced nodes, a procedure that is inaccurate over large 
Piecewise approximation is often intervals because of the oscillatory nature of high-degree polynomials. 
effective. Recall that this was In this section, we discuss a piecewise approach to numerical integration that uses the 
used for spline interpolation. low-order Newton-Cotes formulas. These are the techniques most often applied. 


Example 1 Use Simpson’s rule to approximate i e* dx and compare this to the results obtained 


by adding the Simpson’s rule approximations for i e* dx and iE e* dx. Compare these 


approximations to the sum of Simpson’s rule for ihe e* dx, f 5 e* dx, i e* dx, and i e* dx. 
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Solution Simpson’s rule on [0,4] uses h = 2 and gives 
: 2 
[ car~ Fe +46? + et) = 56.70958 
0 
The exact answer in this case is e* — e® = 53.59815, and the error —3.17143 is far larger 


than we would normally accept. 
Applying Simpson’s rule on each of the intervals [0, 2] and [2, 4] uses h = 1 and gives 


4 2 4 
[ean fear f e* dx 
0 0 2 


we t4e te) +5 (PO +40! +e" 


ae 


= 3 (e+ 4e + 20° + de? + e') 


= 53.86385. 


The error has been reduced to —0.26570. 
For the integrals on [0, 1],[1, 2],[3, 4], and [3,4] we use Simpson’s rule four times with 
h= 5 giving 


4 1 2 3 4 
[eas feat f édx+ | édv+ [ e* dx 
0 0 1 2 3 
1 1 
~ = (eo + dell? +e) + = (e+ 40% +e’) 


+ = (e+ 4e°? + e*) + : (e? + 4e7”? + e’) 


Die 


1 
= g (e+ 4el? + 20 + 40°? + 20% + 46% + 20? + de” + e') 


= 53.61622. 
The error for this approximation has been reduced to —0.01807. a 
b 
To generalize this procedure for an arbitrary integral f (x) dx, choose an even 


integer n. Subdivide the interval [a,b] into n subintervals, and apply Simpson’s rule on 
each consecutive pair of subintervals. (See Figure 4.7.) 


Figure 4.7 
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With h = (b — a)/n and x; = a + jh, for each j = 0,1,...,n, we have 


n/2 
i f@) i= ” $@) dx 
—] /*2j-2 
n/2 


he 
3 [f @2j-2) +4 f 0j-1) + fa] - COE 


for some &; with x2;-2 < § < x2;, provided that f € C*fa, b]. Using the fact that for each 
j=1,2,...,(n/2) — 1 we have f(x2;) appearing in the term corresponding to the interval 
[x2 ;-2,2;] and also in the term corresponding to the interval [x2;,x2j;+2], we can reduce 


this sum to 
h (n/2)—-1 n/2 A n/2 
[ FO) de= Z| fo) +2 DI foap+4 D7 fear) + fen) 9 PE, 
j=l j=l 


The error associated with this approximation is 


1 n/2 
E(f) =—55 FOR), 


j=l 


where x2j-2 < & < x2;, foreachj = 1,2,...,n/2. 
If f € C*[a, b], the Extreme Value Theorem 1.9 implies that f assumes its maximum 
and minimum in [a, b]. Since 


min f(x) < f%&§) < max x fOW), 


xeé[a,b] 

we have 

n = n 

2 Ay) < Ais) < (4) 

5 min FR) s Lf (&) S 5 max FO) 
and 

ge 
Ay) < (4) < (4) 
min fQ) = ou (&) = max fOQ). 


By the Intermediate Value Theorem 1.11, there is a w € (a,b) such that 


ay PIA 


PO = oO f=), 


Thus 


hd n/2 


E(f) = 99 2 7 GA nf, 


or, since h = (b — a)/n, 


a) 
eo 


These observations produce the following result. 


E(f) = -—.— Fh fw). 
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Theorem 4.4 Let f € C*[a,b], n be even, h = (b — a)/n, and xj = a+ jh, for each j = 0,1,...,n. 
There exists a x € (a,b) for which the Composite Simpson’s rule for 1 subintervals can 
be written with its error term as 


b h (n/2)-1 n/2 b—a 
i f@dr=Z/f@+2 YP forp+4)) fer + FO | - =H sOw. 
a j=l j=l 


Notice that the error term for the Composite Simpson’s rule is O(h*), whereas it was 
O(h?) for the standard Simpson’s rule. However, these rates are not comparable because for 
standard Simpson’s rule we have h fixed at h = (b — a)/2, but for Composite Simpson’s 
tule we have h = (b — a)/n, for n an even integer. This permits us to considerably reduce 
the value of / when the Composite Simpson’s rule is used. 

Algorithm 4.1 uses the Composite Simpson’s rule on n subintervals. This is the most 
frequently used general-purpose quadrature algorithm. 


Composite Simpson’s Rule 
To approximate the integral J = ae FS (x) dx: 


INPUT endpoints a, b; even positive integer n. 
OUTPUT approximation X7 to /. 
Step 1 Seth=(b—a)/n. 


Step 2 Set XI0= f(a)+ f(b); 
XI1 =0; (Summation of f (x2;-1).) 
XI2 =0. (Summation of f (x2)).) 


Step 3 Fori=1,...,n—1 do Steps 4 and 5. 
Step 4 SetX =a-+inh. 


Step 5 If iis even then set X/2 = XJ2 + f(X) 
else set X71 = X71 + f(X). 


Step 6 Set XI = h(XI10 + 2-XI2+4- X/1)/3. 


Step 7 OUTPUT (XI); 
STOP. = 


The subdivision approach can be applied to any of the Newton-Cotes formulas. The 
extensions of the Trapezoidal (see Figure 4.8) and Midpoint rules are given without proof. 
The Trapezoidal rule requires only one interval for each application, so the integer n can be 
either odd or even. 


Theorem 4.5 Let f € C?[a,b], h = (b —)/n, and x; = a+ jh, for each j = 0,1,...,n. There exists 
a wt € (a,b) for which the Composite Trapezoidal rule for n subintervals can be written 
with its error term as 


n—1 


. h b— 
/ F(x) dx= 5 | fla) +29) fay) + fH) | - fw. 7 
a j=l 
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Figure 4.8 


For the Composite Midpoint rule, n must again be even. (See Figure 4.9.) 


Figure 4.9 


t T > 
Xyj— 1X27 X2j+1 Xy-1%_ D=Xnai * 


Theorem 4.6 Let f € C?[a,b], n be even, h = (b — a)/(n + 2), and xj = a+ (+ Ih for each 
j = —-1,0,...,n +1. There exists a uw € (a,b) for which the Composite Midpoint rule 
for n + 2 subintervals can be written with its error term as 


n/2 


b 
b- 
i; f(x) dx = 2h >> Sf (X2;) + i? fH). a 


j=0 


Example 2 Determine values of / that will ensure an approximation error of less than 0.00002 when 
approximating te sinx dx and employing 
(a) Composite Trapezoidal rule and (b) Composite Simpson’s rule. 


Solution (a) The error form for the Composite Trapezoidal rule for f(x) = sin x on [0, 7] 
is 


h2 


mh? ae gE 
"| = ee sin| = “ag nl: 


By 
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To ensure sufficient accuracy with this technique we need to have 


LAL LL Tt 
Tr! sin | = 2 < Uz. : 


Since h = a /n implies that n = 2/h, we need 


3 


3 
77 < 9.00002 which implies that n > ( 


IU 


1/2 
ae) & 359.44, 
12(0.00002) 


and the Composite Trapezoidal rule requires n > 360. 


(b) The error form for the Composite Simpson’s rule for f(x) = sin x on [0, 7] is 


~ | sin pl. 
—— sin 
igo 


ht 
mh Ou | Fe sin | = 


180 180 


To ensure sufficient accuracy with this technique we need to have 
ae | sin | < mh" — 0.00002 
——|sin — <0. . 
180 ~ 180 
Using again the fact that n = m/h gives 
5 


5 1/4 
Fat < 9.00002 which implies that _n > (aaa) ~ 17.07. 
nN 


180(0.00002) 


So Composite Simpson’s rule requires only n > 18. 
Composite Simpson’s rule with n = 18 gives 


x el 
[ sinxac~ 3 Doan (F )Ds (Co =) = 2,0000104. 
0 


This is accurate to within about 10~> because the true value is — cos(z7) — (— cos(0)) = 2. 
a 


Composite Simpson’s rule is the clear choice if you wish to minimize computation. 
For comparison purposes, consider the Composite Trapezoidal rule using h = 7/18 for the 
integral in Example 2. This approximation uses the same function evaluations as Composite 
Simpson’s rule but the approximation in this case 


17 : 
[sinxac = 5 = 2Lan(f F) + sind + sins == 2% sin) = 1.9949205. 


is accurate only to about 5 x 107°. 
Maple contains numerous procedures for numerical integration in the NumericalAnal- 
ysis subpackage of the Student package. First access the library as usual with 


with(Student[NumericalAnalysis]) 


The command for all methods is Quadrature with the options in the call specifying the 
method to be used. We will use the Trapezoidal method to illustrate the procedure. First 
define the function and the interval of integration with 


f:=x— sin); a:=0.0; b:=z 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


44 Composite Numerical Integration 209 


After Maple responds with the function and the interval, enter the command 


Quadrature (f (x), x = a..b, method = trapezoid, partition = 20, output = value) 
1.995885973 


The value of the step size h in this instance is the width of the interval b — a divided by the 
number specified by partition = 20. 

Simpson’s method can be called in a similar manner, except that the step size h is 
determined by b — a divided by twice the value of partition. Hence, the Simpson’s rule 
approximation using the same nodes as those in the Trapezoidal rule is called with 


Quadrature (f (x), x = a..b, method = simpson, partition = 10, output = value) 
2.000006785 
Any of the Newton-Cotes methods can be called using the option 
method = newtoncotes[open,n] or method = newtoncotes{closed, n] 


Be careful to correctly specify the number in partition when an even number of divisions 
is required, and when an open method is employed. 


Round-Off Error Stability 


In Example 2 we saw that ensuring an accuracy of 2 x 10~> for approximating 1 sin x dx 
required 360 subdivisions of [0,2] for the Composite Trapezoidal rule and only 18 for 
Composite Simpson’s rule. In addition to the fact that less computation is needed for the 
Simpson’s technique, you might suspect that because of fewer computations this method 
would also involve less round-off error. However, an important property shared by all the 
composite integration techniques is a stability with respect to round-off error. That is, the 
round-off error does not depend on the number of calculations performed. 

To demonstrate this rather amazing fact, suppose we apply the Composite Simpson’s 
rule with n subintervals to a function f on [a, b] and determine the maximum bound for the 
round-off error. Assume that f (x;) is approximated by f (x;) and that 


Numerical integration is expected 
to be stable, whereas numerical 
differentiation is unstable. 


f(x) = fx) +e, foreach i=0,1,...,n, 


where e; denotes the round-off error associated with using f (x;) to approximate f (x;). Then 
the accumulated error, e(h), in the Composite Simpson’s rule is 


h (n/2)-1 n/2 
e(h) = |z | eo +2 s ake wale 


h (n/2)-1 n/2 
< Z| leol+2 D7 leg +4) leojal + len! 
j=l j=l 


If the round-off errors are uniformly bounded by ¢, then 


e(h) < a[e+2(5—1)et+4(S)ete] = sane = he. 


But nh = b — a, so 
e(h) < (b—-a)e, 
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a bound independent of h (and n). This means that, even though we may need to divide 
an interval into more parts to ensure accuracy, the increased computation that is required 
does not increase the round-off error. This result implies that the procedure is stable as h 
approaches zero. Recall that this was not true of the numerical differentiation procedures 
considered at the beginning of this chapter. 


EXERCISE SET 4.4 


1. Use the Composite Trapezoidal rule with the indicated values of n to approximate the following 


integrals. 
2 2 
a. / xInx dx, n=4 b. i, ve'dx, n=4 
- , ~2 
c. / ——dx, n=6 d. / xcosxdx, n=6 
9 +4 0 
2 ay 
e. i e* sin3x dx, n=8 f. / —~— dx, n=8 
1 x2 + 4 


ns 1 32/8 
2 ———- dx, n=8 h. / tanx dx, n=8 
P 3 V¥x2—4 0 


2. Use the Composite Trapezoidal rule with the indicated values of n to approximate the following 


integrals. 
0.5 0.5 
a. / cos*xdx, n=4 b. / xIn(v+1)dx, n=6 
-0.5 -0.5 
1.75 e+2 
c. / (sin? x —2xsinx+1)dx, n=8 d. / dx, n=8 
15 2 xInx 


Use the Composite Simpson’s rule to approximate the integrals in Exercise 1. 

Use the Composite Simpson’s rule to approximate the integrals in Exercise 2. 

Use the Composite Midpoint rule with n + 2 subintervals to approximate the integrals in Exercise 1. 
Use the Composite Midpoint rule with n + 2 subintervals to approximate the integrals in Exercise 2. 
Approximate i, x? In(x? + 1) dx using h = 0.25. Use 


a. Composite Trapezoidal rule. 


ANAM Pw 


b. Composite Simpson’s rule. 
c. Composite Midpoint rule. 
8. Approximate i xe dx using h = 0.25. Use 
a. Composite Trapezoidal rule. 
b. Composite Simpson’s rule. 
c. Composite Midpoint rule. 
9. Suppose that f(0) = 1, f(0.5) = 2.5, f(1) = 2, and f(0.25) = f(0.75) = a. Find a if the 
Composite Trapezoidal rule with n = 4 gives the value 1.75 for i. SQ) dx. 


10. The Midpoint rule for approximating se f (x) dx gives the value 12, the Composite Midpoint rule 
with n = 2 gives 5, and Composite Simpson’s rule gives 6. Use the fact that f(—1) = f(1) and 
f(—0.5) = f (0.5) — 1 to determine f(—1), f(—0.5), f(O), f(0.5), and f(1). 

11. Determine the values of n and h required to approximate 


2 
Dy « 
/ e~ sin 3x dx 
0 


to within 10~*. Use 
a. Composite Trapezoidal rule. 
b. Composite Simpson’s rule. 


ce. Composite Midpoint rule. 
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Repeat Exercise 11 for the integral " x? cos.x dx. 


Determine the values of n and h required to approximate 


2 1 
/ dx 
0 x+4 


to within 10> and compute the approximation. Use 


a. Composite Trapezoidal rule. 

b. Composite Simpson’s rule. 

c. Composite Midpoint rule. 

Repeat Exercise 13 for the integral /, i xInx dx. 
Let f be defined by 


e+1, 0<x<0.1, 
f(x) = 41.001 + 0.03% — 0.1) + 0.3(4 — 0.1)? + 2(x — 0.17, 0.1 <x < 0.2, 
1.009 + 0.15(« — 0.2) + 0.9(« — 0.2)? + 2(x — 0.2)7,, 0.2 <x < 0.3. 


a. Investigate the continuity of the derivatives of f. 


b. Use the Composite Trapezoidal rule with n = 6 to approximate ie J (&) dx, and estimate the 
error using the error bound. 


ae, : i ’ 0.3 
c. Use the Composite Simpson’s rule with n = 6 to approximate th Ff (x) dx. Are the results more 
accurate than in part (b)? 


Show that the error E( f) for Composite Simpson’s rule can be approximated by 


h* my b my 
jell - fol. 


(Hint: 7", f (§) 2h) is a Riemann Sum for f? f°@ &] 

a. Derive an estimate for E(f) in the Composite Trapezoidal rule using the method in Exercise 16. 
b. Repeat part (a) for the Composite Midpoint rule. 

Use the error estimates of Exercises 16 and 17 to estimate the errors in Exercise 12. 

Use the error estimates of Exercises 16 and 17 to estimate the errors in Exercise 14. 


In multivariable calculus and in statistics courses it is shown that 


[ 1 -U/2Daloy 4 1 
——e y= ly 
—o OW Qn 


for any positive o. The function 


1 2 
(x) = e F/2)a/e) 
f oV2n 


is the normal density function with mean 4 = 0 and standard deviation o. The probability that a 
randomly chosen value described by this distribution lies in [a, b] is given by i J (x) dx. Approximate 
to within 10~> the probability that a randomly chosen value described by this distribution will lie in 
a. [-o,o] b. [—20,20] ec. [-—30,30] 

Determine to within 10~° the length of the graph of the ellipse with equation 4x? + 9y” = 36. 

A car laps a race track in 84 seconds. The speed of the car at each 6-second interval is determined 
by using a radar gun and is given from the beginning of the lap, in feet/second, by the entries in the 
following table. 


36 
121 


Time [0 |6 | 12 | 18 | 24 | 30 
Speed | 124 | 134 | 148 | 156 | 147 | 133 


42 | 48 | 54 | 60 | 66 | 72 | 78 84 
109 | 99 | 85 | 78 | 89 | 104 | 116 | 123 


How long is the track? 
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Numerical Differentiation and Integration 


A particle of mass m moving through a fluid is subjected to a viscous resistance R, which is a function 
of the velocity v. The relationship between the resistance R, velocity v, and time f¢ is given by the 


equation 
v(t) m 
t= du. 
v(t) R(u) 


Suppose that R(v) = —v./V for a particular fluid, where R is in newtons and v is in meters/second. If 
m = 10 kg and v(0) = 10 m/s, approximate the time required for the particle to slow to v = 5 m/s. 


To simulate the thermal characteristics of disk brakes (see the following figure), D. A. Secrist and 
R. W. Hornbeck [SH] needed to approximate numerically the “area averaged lining temperature,” T, 
of the brake pad from the equation 


ro 
/ T(r)r6, dr 


TO > 
i, r@, dr 
Te 


where r, represents the radius at which the pad-disk contact begins, ro represents the outside radius 
of the pad-disk contact, 6, represents the angle subtended by the sector brake pads, and T(r) is the 
temperature at each point of the pad, obtained numerically from analyzing the heat equation (see 
Section 12.2). Suppose r. = 0.308 ft, ro = 0.478 ft, 6, = 0.7051 radians, and the temperatures given 
in the following table have been calculated at the various points on the disk. Approximate T. 


T= 


rif} TO)CF) rift) TH) CF) rf} Tr) CF) 


0.308 640 0.376 1034 0.444 1204 
0.325 794 0.393 1064 0.461 1222 
0.342 885 0.410 1114 0.478 1239 
0.359 943 0.427 1152 


Brake disk 


Find an approximation to within 10-4 of the value of the integral considered in the application opening 
this chapter: 


48 
1+ (cos x)? dx. 
0 


The equation 


i © Le? gt = 045 
0 
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can be solved for x by using Newton’s method with 


* 1 2p 
x= | —e?? at — 0.45 
FO) [ — 


and 
1 2 
f@= eran: 
V2 
To evaluate f at the approximation p;, we need a quadrature formula to approximate 
Pky 5 
eT! dt. 
| V2 


a. Find a solution to f(x) = 0 accurate to within 10~> using Newton’s method with pp = 0.5 and 
the Composite Simpson’s rule. 


b. Repeat (a) using the Composite Trapezoidal rule in place of the Composite Simpson’s rule. 


| Sa 4.5 Romberg Integration 


In this section we will illustrate how Richardson extrapolation applied to results from the 
Composite Trapezoidal rule can be used to obtain high accuracy approximations with little 
computational cost. 

In Section 4.4 we found that the Composite Trapezoidal rule has a truncation error of 
order O(h”). Specifically, we showed that for h = (b — a)/n and xj = a+ jh we have 


n—1 


. h 
[ fera=F] r@+2d so) +10) 


j=l 


_O-OF"W > 
12 


for some number ju in (a, dD). 
By an alternative method it can be shown (see [RR], pp. 136-140), thatif f € C™[a, b], 
the Composite Trapezoidal rule can also be written with an error term in the form 


n—1| 


b h 2 4 6 
[ feoa=s f@+2>° fy) + FO + Kh’ + Koh" + K3h+---, (4.33) 


j=l 


where each K; is a constant that depends only on f@'—) (a) and f@—)(b). 
Recall from Section 4.2 that Richardson extrapolation can be performed on any 
approximation procedure whose truncation error is of the form 


m—1 
Yo Kh + On), 


j=l 


for a collection of constants K; and when a < a < a3 <--+ < d,. In that section we 
gave demonstrations to illustrate how effective this techniques is when the approximation 
procedure has a truncation error with only even powers of h, that is, when the truncation 
error has the form. 


m—1 
> Kh? + O(h™). 


j=l 
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Werner Romberg (1909-2003) Because the Composite Trapezoidal rule has this form, it is an obvious candidate for 
devised this procedure for extrapolation. This results in a technique known as Romberg integration. 
improving the accuracy of the To approximate the integral if : F (x) dx we use the results of the Composite Trapezoidal 


Trapezoidal rule by eliminating pute with n = 1,2,4,8, 16,..., and denote the resulting approximations, respectively, by 


R11, R21, R31, etc. We then apply extrapolation in the manner given in Section 4.2, that is, 
we obtain O(h*) approximations R22, R32, R42, etc., by 


the successive terms in the 
asymptotic expansion in 1955. 


1 
Ryo = Rea + 3 Ret —Rye11), fork = 2,3,... 
Then O(h°) approximations R33, R43, R53, etc., by 
1 
Ry3 — Ryo + 75 Ree - Rx-1.2)s fork = 3,4,.... 


In general, after the appropriate R,,;_; approximations have been obtained, we determine 
the O(h/) approximations from 


1 - 
= [Rei Ry-1,j-1), fork =j,j+1,... 


Raj = Re jer + al 


Example 1 Use the Composite Trapezoidal rule to find approximations to i sinx dx withn = 1, 2, 4, 
8, and 16. Then perform Romberg extrapolation on the results. 
The Composite Trapezoidal rule for the various values of n gives the following approx- 
imations to the true value 2. 


Ry = [sin 0 + sinz] = 0; 


ne - [sino +2sin 5 % sinz| = 1.57079633; 


xl. _ _ 0 . 3 . 

R3; =—|sin0+2[ sin—+sin—+sin + sinz | = 1.89611890; 

18 4 2 4 
Rj ate dn een ede an ee | Sassi. 
41 = 16 sin sin 8 sin 4 sin 4 sin 8 sinzw}= 1. 5 
Rep ano de ois ® a aa ee | 1.99357034 

= —]sin sin — sin — coe sin — sin — sin7z)} = 1. ; 
at 35 16 8 8 16 


The O(h*) approximations are 

Rog =Ro1 + (Ra — Ry1) = 2.09439511; R32 = R31 + (Ra — R21) = 2.00455976; 
Raz =Ra. + +(e — R31) = 2.00026917; R52 =Rs51 + $(Rs, — R41) = 2.00001659; 
The O(h°) approximations are 

R33 = R32+ (Ra — Roz) = 1.99857073; R43 = Raat s(n — R32) = 1.99998313; 
R53 = Rs2+ as (Rs2 — R42) = 1.99999975. 

The two O(A®) approximations are 


1 1 
Rag = Rast gq (Ras—Rs3) = 2.000555; Rs.4 = Rs.3+z (Rs3—Ras) = 2.00000001, 
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and the final O(h!°) approximation is 


1 
R55 = R54 + 355 (R54 = Raa) = 1.99999999. 


These results are shown in Table 4.9. | 
Table 4.9 4 
1.57079633 2.09439511 
1.89611890 2.00455976 1.99857073 
1.97423160 2.00026917 1.999983 13 2.00000555 
1.99357034 2.00001659 1.99999975 2.00000001 1.99999999 


Notice that when generating the approximations for the Composite Trapezoidal rule 
approximations in Example 1, each consecutive approximation included all the functions 
evaluations from the previous approximation. That is, Ry; used evaluations at 0 and zr, Ro1 
used these evaluations and added an evaluation at the intermediate point 7/2. Then R31 
used the evaluations of R2,; and added two additional intermediate ones at 2/4 and 37/4. 
This pattern continues with R4 using the same evaluations as R3 but adding evaluations 
at the 4 intermediate points 2/8, 37/8, 52/8, and 77/8, and so on. 

This evaluation procedure for Composite Trapezoidal rule approximations holds for an 
integral on any interval [a,b]. In general, the Composite Trapezoidal rule denoted Ry+1,1 
uses the same evaluations as Ry; but adds evaluations at the 2*-2 intermediate points. 
Efficient calculation of these approximations can therefore be done in a recursive manner. 

To obtain the Composite Trapezoidal rule approximations for i : f(x) dx, let hy = 
(b — a)/m, = (b — a)/2*—!. Then 


h b— 
Ry =S1@+ foOl= Spa + FO! 


and 
h 
Ro = Sif) + f(b) +2f(at+h)I. 


By reexpressing this result for Ry; we can incorporate the previously determined approxi- 
mation R, | 


(b—a) 
4 


oA 1 
Ry = Eo + f(b) +2f (« + 5 ) = 5 FR + hy f(a+ ho)]. 


In a similar manner we can write 
1 
R3, = 3 {Ra,t + hol f(ath3) + f(at 3h3)]}; 


and, in general (see Figure 4.10 on page 216), we have 


gk-2 


Raa=ilr + Ini D> f (a+ (-Dh (4.34) 
m= 5 | Rea k-1 f(a i k) | > : 


i=1 


for each k = 2,3,...,n. (See Exercises 14 and 15.) 
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Figure 4.10 


Extrapolation then is used to produce Oh! ) approximations by 


1 es 
Rxj = Rij-1 + qt oy Rei —Re-1j-1), fork =j,j+1,... 


4i-1 


as shown in Table 4.10. 


Table 410 ¢ —o(tg) Ont) ——O(H8) OH) 0 (he) 
1 Ria 
2 Roi Roo 
3 R31 R32 R33 
4 Ray Rap Ra3 Rag 
. Ru Rno Rus Rua seam Ran 


The effective method to construct the Romberg table makes use of the highest order 
of approximation at each step. That is, it calculates the entries row by row, in the order 
Ri1, Roi, R22, R31, R32, R33, etc. This also permits an entire new row in the table to be 
calculated by doing only one additional application of the Composite Trapezoidal rule. It 
then uses a simple averaging on the previously calculated values to obtain the remaining 
entries in the row. Remember 


© Calculate the Romberg table one complete row at a time. 


Example 2 Add an additional extrapolation row to Table 4.10 to approximate i sin x dx. 


Solution To obtain the additional row we need the trapezoidal approximation 


24 
1 pe, Ge ie 
= Bee pen SF Il =9009830336, 
61 = 5 | MT 76 aon 32 
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The values in Table 4.10 give 


1 1 
Roz = Roi + 3 Rout — R51) = 1.99839336 + genes 9 — 1.99357035) 


= 2.00000103; 


1 1 
Ro3 = Ro2 + 75 (Roe — R52) = 2.00000103 + ao — 2.00001659) 


= 2.00000000; 
1 
Roa = Ro3 + 63 (Re — R53) = 2.00000000; 
1 
Ros = Roa t+ 755 (Roa — R54) = 2.00000000; 


and R66 = Ros + qs (Ros — R55) = 2.00000000. The new extrapolation table is shown 
in Table 4.11. a 


Table 4.11 0 


1.57079633 2.09439511 

1.89611890 2.00455976 1.99857073 

1.97423160 2.000269 17 1.999983 13 2.00000555 

1.99357034 2.00001659 1.99999975 2.00000001 1.99999999 

1.99839336 2.00000103 2.00000000 2.00000000 2.00000000 2.00000000 


Notice that all the extrapolated values except for the first (in the first row of the second 
column) are more accurate than the best composite trapezoidal approximation (in the last row 
of the first column). Although there are 21 entries in Table 4.11, only the six in the left column 
require function evaluations since these are the only entries generated by the Composite 
Trapezoidal rule; the other entries are obtained by an averaging process. In fact, because 
of the recurrence relationship of the terms in the left column, the only function evaluations 
needed are those to compute the final Composite Trapezoidal rule approximation. In general, 
R,,; requires 1 + 2*—! function evaluations, so in this case 1 + 2° = 33 are needed. 

Algorithm 4.2 uses the recursive procedure to find the initial Composite Trapezoidal 
Rule approximations and computes the results in the table row by row. 


Romberg 
To approximate the integral J = ra J (x) dx, select an integer n > 0. 


INPUT endpoints a, b; integer n. 
OUTPUT anarray R. (Compute R by rows; only the last 2 rows are saved in storage.) 
Step 1 Seth=b-—a; 
R= 4(f (a) + F ()). 
Step 2. OUTPUT (R;,:). 


Step 3 Fori=2,...,n do Steps 4-8. 
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gi-2 


1 
Step 4 Set Roi ==|Riith>> fat k-0.5)h) 
2 k=1 


(Approximation from Trapezoidal method.) 


Step 5 Forj=2,...,i 

Ro j-1 — Ri j-1 
4-1] 

Step 6 OUTPUT (Ro; for j = 1,2,..., 7). 

Step 7 Seth =h/2. 

Step 8 Forj=1,2,...,isetRi; = R2;. (Update row 1 of R.) 


Step 9 STOP. " 


set Ro; = Ro j-1 + (Extrapolation.) 


Algorithm 4.2 requires a preset integer n to determine the number of rows to be gen- 
erated. We could also set an error tolerance for the approximation and generate n, within 
some upper bound, until consecutive diagonal entries R,_1,-1 and R,, agree to within 
the tolerance. To guard against the possibility that two consecutive row elements agree 
with each other but not with the value of the integral being approximated, it is common to 
generate approximations until not only |Ry—1n—1 — Rnn»| is within the tolerance, but also 
|Rn—2n—2 — Rn—1n—1|. Although not a universal safeguard, this will ensure that two differ- 
ently generated sets of approximations agree within the specified tolerance before Ryn, is 
accepted as sufficiently accurate. 

Romberg integration can be performed with the Quadrature command in the Numeri- 
calAnalysis subpackage of Maple’s Student package. For example, after loading the package 
and defining the function and interval, the command 


Quadrature( f (x), x = a..b, method = romberge, output = information) 


produces the values shown in Table 4.11 together with the information that 6 applications 
of the Trapezoidal rule were used and 33 function evaluations were required. 

Romberg integration applied to a function f on the interval [a, b] relies on the assump- 
tion that the Composite Trapezoidal rule has an error term that can be expressed in the 
form of Eq. (4.33); that is, we must have f € C?‘+?[a, b] for the kth row to be generated. 
General-purpose algorithms using Romberg integration include a check at each stage to 
method indicates thata check is €@MSure that this assumption is fulfilled. These methods are known as cautious Romberg 
incorporated to determine if the algorithms and are described in [Joh]. This reference also describes methods for using the 
continuity hypotheses are likely Romberg technique as an adaptive procedure, similar to the adaptive Simpson’s rule that 
to be true. will be discussed in Section 4.6. 


The adjective cautious used in 
the description of a numerical 


EXERCISE SET 45 


1. Use Romberg integration to compute R33 for the following integrals. 


1.5 1 
a. - x’ Inx dx b. / xe dx 
1 0 


035° 9 x/4 
Cc. i ao dx d. / x’ sinx dx 
0 a 0 
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a 16 9 
e. e* sin 2x dx f. ——_ dx 
0 1 2-4 
3.5 m/4 
g. eee h. / (cos.x)? dx 
3 Ve —4 0 
2. Use Romberg integration to compute R3,; for the following integrals. 
1 0.75 
a. i (cosx)? dx b. i xIn(x + 1) dx 
“1 —0.75 
4 2e 1 
c. / ((sinx)” — 2x sinx + 1) dx d. / dx 
1 . xInx 


3. Calculate R44 for the integrals in Exercise 1. 

4. Calculate R,, for the integrals in Exercise 2. 

5. Use Romberg integration to approximate the integrals in Exercise 1 to within 10~°. Compute the 
Romberg table until either |R,»-1n-1 — Ran| < 10~°, orn = 10. Compare your results to the exact 
values of the integrals. 

6. Use Romberg integration to approximate the integrals in Exercise 2 to within 10~°. Compute the 
Romberg table until either |R»—-in—1 — Ran| < 10~°, orn = 10. Compare your results to the exact 
values of the integrals. 


7. Use the following data to approximate / ‘ Ff (x) dx as accurately as possible. 
x |i 2 | 3 | 4 | 5 
2.6734 | 2.8974 | 3.0976 | 3.2804 


Fee) | 2.4142 


8. Romberg integration is used to approximate 


1 x2 
[me 
0 1+x3 


If Ri, = 0.250 and Roy = 0.2315, what is Ry}? 


9. Romberg integration is used to approximate 


3 
/ F(x) dx. 
2 


If f(2) = 0.51342, f(3) = 0.36788, R3; = 0.43687, and R33 = 0.43662, find f(2.5). 

10. Romberg integration for approximating i F (x) dx gives Ri; = 4 and Ry = 5. Find f(1/2). 

11. Romberg integration for approximating i f(x) dx gives Ry, = 8, Rx» = 16/3, and R33 = 208/45. 
Find R31 n 


12. Use Romberg integration to compute the following approximations to 


48 
/ 1+ (cos x)? dx. 
0 


[Note: The results in this exercise are most interesting if you are using a device with between seven- 
and nine-digit arithmetic. ] 


a. Determine R11, R21, R31, Ra, and Rs, and use these approximations to predict the value of the 
integral. 


Determine R2>, R33, R44, and R55, and modify your prediction. 
Determine R61, Ro2, Ro3, Ro, Ros, and Ro, and modify your prediction. 


Determine R77, Rg, Ro9, and Rjo,19, and make a final prediction. 


i eC 


Explain why this integral causes difficulty with Romberg integration and how it can be reformu- 

lated to more easily determine an accurate approximation. 

13. Show that the approximation obtained from R,.2 is the same as that given by the Composite Simpson’s 
rule described in Theorem 4.4 with h = h,. 
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14. Show that, for any k, 


ok-1_y 


F k-2 7 ee 
d f («+ sh) = i (a+ (:- 5) i.) + d flat ihk_1). 


15. Use the result of Exercise 14 to verify Eq. (4.34); that is, show that for all k, 


gk-2 


1 1 
Ry = 3 Ry-iyt + Age a f (a+ ( - 5) i.) 


i=1 


16. In Exercise 26 of Section 1.1, a Maclaurin series was integrated to approximate erf(1), where erf(x) 
is the normal distribution error function defined by 


eto = / ew? dt. 
0 


Approximate erf(1) to within 10-7. 


3 4.6 Adaptive Quadrature Methods 


The composite formulas are very effective in most situations, but they suffer occasionally 
because they require the use of equally-spaced nodes. This is inappropriate when integrating 
a function on an interval that contains both regions with large functional variation and regions 
with small functional variation. 


IIlustration The unique solution to the differential equation y” + 6y’ + 25 = 0 that additionally satisfies 
y(0) = O and y'(0) = 4 is y(x) = e~** sin 4x. Functions of this type are common in 
mechanical engineering because they describe certain features of spring and shock absorber 
systems, and in electrical engineering because they are common solutions to elementary 
circuit problems. The graph of y(x) for x in the interval [0, 4] is shown in Figure 4.11. 


Figure 4.11 


a =x 
y &) =e “sin 4x 
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Suppose that we need the integral of y(x) on [0, 4]. The graph indicates that the integral on 
[3,4] must be very close to 0, and on [2, 3] would also not be expected to be large. However, 
on [0, 2] there is significant variation of the function and it is not at all clear what the integral 
is on this interval. This is an example of a situation where composite integration would be 
inappropriate. A very low order method could be used on [2, 4], but a higher-order method 
would be necessary on [0, 2]. 


The question we will consider in this section is: 


e How can we determine what technique should be applied on various portions of the 
interval of integration, and how accurate can we expect the final approximation to be? 


We will see that under quite reasonable conditions we can answer this question and also 
determine approximations that satisfy given accuracy requirements. 

If the approximation error for an integral on a given interval is to be evenly distributed, 
a smaller step size is needed for the large-variation regions than for those with less variation. 
An efficient technique for this type of problem should predict the amount of functional vari- 
ation and adapt the step size as necessary. These methods are called Adaptive quadrature 
methods. Adaptive methods are particularly popular for inclusion in professional software 
packages because, in addition to being efficient, they generally provide approximations that 
are within a given specified tolerance. 

In this section we consider an Adaptive quadrature method and see how it can be used to 
reduce approximation error and also to predict an error estimate for the approximation that 
does not rely on knowledge of higher derivatives of the function. The method we discuss 
is based on the Composite Simpson’s rule, but the technique is easily modified to use other 
composite procedures. 

Suppose that we want to approximate es J (x) dx to within a specified tolerance e > 0. 
The first step is to apply Simpson’s rule with step size h = (b — a)/2. This produces (see 
Figure 4.12) 


b 5: 
a f@) dx = S(a,b) — a f%@), for some é in (a,b), (4.35) 


where we denote the Simpson’s rule approximation on [a, b] by 


h 
eC ma he rk 


Figure 4.12 
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The next step is to determine an accuracy approximation that does not require f“ (é). 
To do this, we apply the Composite Simpson’s rule withn = 4 and step size (b—a)/4 = h/2, 
giving 


a h h 3h 
re) av = 2] ro +4r (at 3) +2re+m+4r (at Zt) + s0)| 


4 
7 (5) C—O pH@, (4.36) 


180 


for some é in (a,b). To simplify notation, let 


5 a+b _A 4 h h 
(« 5 )=§|ro+ f(a+5) + rat | 


and 


a+b h 3h 


Then Eq. (4.36) can be rewritten (see Figure 4.13) as 


2 _ a+b a+b Lf ay 2 
[ roa=s(aS )+s( 0) 7g (55) £0 (4.37) 


Figure 4.13 


The error estimation is derived by assuming that € ~ é or, more precisely, that f (£) ~ 
f@), and the success of the technique depends on the accuracy of this assumption. If it 
is accurate, then equating the integrals in Eqs. (4.35) and (4.37) gives 


a+b a+b 1 (Wb ae hd a 
s(a : )+5/ : .») (55) f €)~S@.b)— Ff), 

he 16 a+b pe 

I pyrex ag 18 _ 

iO ~ T [s1a.0) s(a * ) s( : 6)]. 
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so 
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Using this estimate in Eq. (4.37) produces the error estimation 


7 a+b a+b mot he (4) 
[ fora s(a 5 ) s( 5 )|~ (5) (§) 
1 a+b a+b 
“75 S(a, b) s (a 5 ) s( 5 .b)]. 


This implies that S(a, (a+ b)/2) + S((a+b)/2, b) approximates ii Ff (x) dx about 15 times 
better than it agrees with the computed value S(a, b). Thus, if 


sian s (a=) s(=.») 
b 
[ fou s (a *5*) s(*F.6}] <s. (4.39) 


i; 
ea” gl 2s 
2 2 


; . By b 
is assumed to be a sufficiently accurate approximation to i, _ f(x) dx. 


< 15s, (4.38) 


we expect to have 


and 


Example 1 Check the accuracy of the error estimate given in (4.38) and (4.39) when applied to the 


integral 
m/2 
/ sinx dx = 1. 
0 
by comparing 


i5/5(%5)-8(03)-8(7-5)] © 


Solution We have 


[ sinxas 5 (0,7) (3.3). 


I mw/4y. ee L _ a a 
5 (0, ~) = [sin 0 + 4 sin 5 sin | = 5 2v2 + 1) = 1.002279878 


2 
and 
s(0,=) +5(=,=) a ce [sind +4sin 2 4 2sin Lane + sin 4 
4 4 2 3 8 4 8 2 
= 1.000134585. 
So 


Is (0,5) 5(0,=) 5 (4. 5)| = 11.002279878 — 1.000134585] = 0002145293, 


The estimate for the error obtained when using S(a, (a+ b))+S((a +b), b) to approximate 
ia f (x) dx is consequently 


u s (0 ’ s(o, *) S (-. =)| = 0.000143020, 
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It is a good idea to include a 
margin of safety when it is 
impossible to verify accuracy 


assumptions. 


Numerical Differentiation and Integration 


which closely approximates the actual error 


m/2 
/ sin x dx — 1.000134585| = 0.000134585, 
0 


even though D sin x = sin x varies significantly in the interval (0, 2/2). a 


When the approximations in (4.38) differ by more than 15¢, we can apply the Simpson’s 
rule technique individually to the subintervals [a, (a + b)/2] and [(a + b)/2,b]. Then we 
use the error estimation procedure to determine if the approximation to the integral on each 
subinterval is within a tolerance of ¢/2. If so, we sum the approximations to produce an 
approximation to pe f (x) dx within the tolerance «. 

If the approximation on one of the subintervals fails to be within the tolerance ¢/2, then 
that subinterval is itself subdivided, and the procedure is reapplied to the two subintervals to 
determine if the approximation on each subinterval is accurate to within ¢/4. This halving 
procedure is continued until each portion is within the required tolerance. 

Problems can be constructed for which this tolerance will never be met, but the tech- 
nique is usually successful, because each subdivision typically increases the accuracy of 
the approximation by a factor of 16 while requiring an increased accuracy factor of only 2. 

Algorithm 4.3 details this Adaptive quadrature procedure for Simpson’s rule, although 
some technical difficulties arise that require the implementation to differ slightly from the 
preceding discussion. For example, in Step | the tolerance has been set at 10e rather than 
the 15e figure in Inequality (4.38). This bound is chosen conservatively to compensate for 
error in the assumption f (é) ~ f(é). In problems where f is known to be widely 
varying, this bound should be decreased even further. 

The procedure listed in the algorithm first approximates the integral on the leftmost 
subinterval in a subdivision. This requires the efficient storing and recalling of previously 
computed functional evaluations for the nodes in the right half subintervals. Steps 3, 4, 
and 5 contain a stacking procedure with an indicator to keep track of the data that will be 
required for calculating the approximation on the subinterval immediately adjacent and to 
the right of the subinterval on which the approximation is being generated. The method is 
easier to implement using a recursive programming language. 


Adaptive Quadrature 
To approximate the integral J = th J (x) dx to within a given tolerance: 
INPUT endpoints a, b; tolerance TOL; limit N to number of levels. 


OUTPUT approximation APP or message that N is exceeded. 
Step 1 Set APP = 0; 


i=1; 

TOL; = 10 TOL; 
aj =a; 

hy = (b— a)/2; 
FA; = f(@); 

FC = flat hi); 
FB; = f(b); 


S; = h;(FA; + 4FC; + FB;)/3; (Approximation from Simpson’s 
method for entire interval.) 
L; = 1. 
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Step 2. While i > 0 do Steps 3-5. 


Step 3 Set FD= f(a; +h;/2); 
FE = f (a; + 3h;/2); 
Sl =h,(FA; + 4FD + FC;)/6; (Approximations from Simpson’s 
method for halves of subintervals.) 
S2 = hj(FC; + 4FE + FB;)/6; 
vj =a;;_ (Save data at this level.) 


v2 = FA;; 
v3 = FC;; 
v4 = FB;; 
us = hi; 

V6 = TOL;; 
v7 = Si; 

Vg = Li. 


Step 4 Seti=i-—1. (Delete the level.) 
Step 5 If |S1+ S2—v9| < v6 
then set APP = APP + (S1+ 82) 


else 
if (vg = N) 
then 
OUTPUT (‘LEVEL EXCEEDED’); (Procedure fails.) 
STOP. 
else (Add one level.) 

seti=i+1; (Data for right half subinterval.) 
aj = V1 + Us; 
FA; = 03; 
FC; = FE; 
FB; = v4; 
hy = v5/2; 
TOL; = v6/2; 
Si = S2; 
Lj = vg +1; 

seti=i+1; (Data for left half subinterval.) 
qj = V1; 
FA; = v2; 
FC; = FD; 
FB; = v3; 
hy = hy-1; 
TOL; = TOL;_1; 
Si = S1; 
L; = Ty_1. 

Step 6 OUTPUT (APP); (APP approximates I to within TOL.) 
STOP. | 


Illustration The graph of the function f(x) = (100/x) sin(10/x) for x in [1,3] is shown in Figure 
4.14. Using the Adaptive Quadrature Algorithm 4.3 with tolerance 10~* to approximate 
rf : f (x) dx produces —1.426014, a result that is accurate to within 1.1 x 107>. The approxi- 
mation required that Simpson’s rule with n = 4 be performed on the 23 subintervals whose 
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endpoints are shown on the horizontal axis in Figure 4.14. The total number of functional 
evaluations required for this approximation is 93. 


Figure 4.14 


yah) = ” sin iS 


x 


The largest value of h for which the standard Composite Simpson’s rule gives 10~* accuracy 
is h = 1/88. This application requires 177 function evaluations, nearly twice as many as 
Adaptive quadrature. 


Adaptive quadrature can be performed with the Quadrature command in the Numerical- 
Analysis subpackage of Maple’s Student package. In this situation the option adaptive = 
true is used. For example, to produce the values in the Illustration we first load the package 
and define the function and interval with 
100 10 

fr=x-> “sin(®); a:= 1.0; b := 3.0 
x x 

Then give the NumericalAnalysis command 


Quadrature(f (x),x = a..b,adaptive = true,method = [simpson, 10-4], output = 
information) 
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This produces the approximation — 1.42601481 and a table that lists all the intervals 
on which Simpson’s rule was employed and whether the appropriate tolerance was satisfied 
(indicated by the word PASS) or was not satisfied (indicated by the word fail). It also 
gives what Maple thinks is the correct value of the integral to the decimal places listed, in 
this case —1.42602476. Then it gives the absolute and relative errors, 9.946 x 10~° and 
6.975 x 1074, respectively, assuming that its correct value is accurate. 


EXERCISE SET 46 


1. Compute the Simpson’s rule approximations S(a, b), S(a, (a + b)/2), and S((a + b)/2,b) for the 
following integrals, and verify the estimate given in the approximation formula. 


15 1 
a. i x? Inx dx b. / xe dx 
1 0 


035 9 1/4 
c. / ——— dx d. / x? sinx dx 
0 «w-4 0 
1/4 16 9 
e. / e** sin 2x dx f. : sie dx 
0 1 2-4 
3.5 x/4 
g. = dx h. / (cos x)* dx 
3 V¥e—4 0 


2. Use Adaptive quadrature to find approximations to within 10~? for the integrals in Exercise 1. Do not 
use a computer program to generate these results. 


3. Use Adaptive quadrature to approximate the following integrals to within 107>. 


3 3 
a. / e** sin 3x dx b. / e** sin 2x dx 
_ . 
c. 7 (2x cos(2x) — (x — 2)*) dx d. / (4x cos(2x) — (x — 2)*) dx 
0 0 
4. Use Adaptive quadrature to approximate the following integrals to within 10~>. 
™ 2 
a. j (sinx + cosx) dx b. / (x + sin 4x) dx 
0 1 
1 x/2 
c. / x sin 4x dx d. (6cos 4x + 4 sin 6x)e* dx 
=i 0 


5. Use Simpson’s Composite rule with n = 4, 6, 8,..., until successive approximations to the following 
integrals agree to within 10~°. Determine the number of nodes required. Use the Adaptive Quadrature 
Algorithm to approximate the integral to within 10~°, and count the number of nodes. Did Adaptive 
quadrature produce any improvement? 


8 w 
a. / xcosx? dx b. / xsinx? dx 
0 0 


au Tu 
e . 
c. / x” cos x dx d. / x? sinx dx 
0 0 


6. Sketch the graphs of sin(1/x) and cos(1/x) on [0.1, 2]. Use Adaptive quadrature to approximate the 
following integrals to within 10~°. 


2 ll | 
a. / sin — dx b. i cos — dx 
0.1 x 0.1 x 
7. The differential equation 
mu (t) + ku(t) = Fo cos wt 


describes a spring-mass system with mass m, spring constant k, and no applied damping. The term 
Fy cos wt describes a periodic external force applied to the system. The solution to the equation when 
the system is initially at rest (u/(0) = u(O) = 0) is 
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Fo 


(wp — 0) 


[k 
(cos @t—cos@ot), where am =,/— <a. 
m 


u(t) = 
m 


Sketch the graph of u when m = 1, k = 9, Fy = 1, m = 2, andt € [0,27]. Approximate 1 u(t) dt 
to within 10-4. 


8. If the term cu’(t) is added to the left side of the motion equation in Exercise 7, the resulting differential 
equation describes a spring-mass system that is damped with damping constant c 4 0. The solution 
to this equation when the system is initially at rest is 


Fo 


Ca + m2(we _ w?)? 


u(t) = ce" + ce" + (cw sinwt +m (@ - w) cos wt) : 


where 


—c +,/c? — 4am? —c — ,/c? — 4am? 
n= and n= Fi 


2m 2m 


a Letm = 1,k = 9, Fo = 1,c = 10, and wm = 2. Find the values of c, and c> so that 
u(O) = uw’(0) = 0. 


b. Sketch the graph of u(t) for t € [0, 277] and approximate i u(t) dt to within 10-*. 
9. Let T(a,b) and T(a, £2) + T(2, b) be the single and double applications of the Trapezoidal rule 
to ic J (x) dx. Derive the relationship between 


b b 
Tab) t(a% ) r( 1b) 
2 2 


b 
[ fea T (a, °*) r(* 5), 


10. The study of light diffraction at a rectangular aperture involves the Fresnel integrals 


and 


ig ca 
c(t) = / cos —w* dw and s(t)= / sin —@* dw. 
0 2 0 2 


Construct a table of values for c(t) and s(t) that is accurate to within 10~* for values of t = 0.1, 
0.2,..., 1.0. 


| Sa 4.7. Gaussian Quadrature 


The Newton-Cotes formulas in Section 4.3 were derived by integrating interpolating poly- 
nomials. The error term in the interpolating polynomial of degree n involves the (n + 1)st 
derivative of the function being approximated, so a Newton-Cotes formula is exact when 
approximating the integral of any polynomial of degree less than or equal to n. 

All the Newton-Cotes formulas use values of the function at equally-spaced points. 
This restriction is convenient when the formulas are combined to form the composite rules 
we considered in Section 4.4, but it can significantly decrease the accuracy of the approx- 
imation. Consider, for example, the Trapezoidal rule applied to determine the integrals of 
the functions whose graphs are shown in Figure 4.15. 
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Figure 4.16 


The Trapezoidal rule approximates the integral of the function by integrating the linear 
function that joins the endpoints of the graph of the function. But this is not likely the best 
line for approximating the integral. Lines such as those shown in Figure 4.16 would likely 
give much better approximations in most cases. 


Gauss demonstrated his method 
of efficient numerical integration 
in a paper that was presented to 
the Gottingen Society in 1814. 
He let the nodes as well as the 
coefficients of the function 
evaluations be parameters in the 
summation formula and found 
the optimal placement of the 
nodes. Goldstine [Golds], 

pp 224-232, has an interesting 
description of his development. 


a SS SS SS SSS5 


bay 


Gaussian quadrature chooses the points for evaluation in an optimal, rather than equally- 


spaced, way. The nodes x), x2,...,X, in the interval [a, b] and coefficients c1, c2,.. 
chosen to minimize the expected error obtained in the approximation 


b n 
/ f(x) dx © Y "cif (xi). 
a 1 


i= 


., Cn, are 


To measure this accuracy, we assume that the best choice of these values produces the exact 
result for the largest class of polynomials, that is, the choice that gives the greatest degree 
of precision. 

The coefficients c,,c2,...,C, in the approximation formula are arbitrary, and the nodes 
X1,X2,...,X, are restricted only by the fact that they must lie in [a,b], the interval of 
integration. This gives us 2n parameters to choose. If the coefficients of a polynomial are 
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considered parameters, the class of polynomials of degree at most 2n — 1 also contains 
2n parameters. This, then, is the largest class of polynomials for which it is reasonable to 
expect a formula to be exact. With the proper choice of the values and constants, exactness 
on this set can be obtained. 

To illustrate the procedure for choosing the appropriate parameters, we will show how 
to select the coefficients and nodes when n = 2 and the interval of integration is [—1, 1]. We 
will then discuss the more general situation for an arbitrary choice of nodes and coefficients 
and show how the technique is modified when integrating over an arbitrary interval. 

Suppose we want to determine cj, C2, x;, and x2 so that the integration formula 


1 
/ f(x) dx © c1 f (x1) + 2 f 2) 
-1 


gives the exact result whenever f(x) is a polynomial of degree 2(2) — 1 = 3 or less, that 
is, when 


f (x) = ag + ayx + anx? + a3x?, 


for some collection of constants, ao, a), d2, and a3. Because 
[ico arx + ax? + ass") ax = ay [ 1 deta fxdetay fe drta fx dx, 


this is equivalent to showing that the formula gives exact results when f(x) is 1, x, x”, 
and x°. Hence, we need cj, c2, x}, and x2, so that 


1 1 
attet=[ 1 dx = 2, conten = | x dx = 0, 


1 2 1 
Ce +o xb = v dx = 3, and cost teed =| x dx =0. 


q=1, @g=1, xs =-—, and »=—, 


which gives the approximation formula 


1 —" 
[ fours me +f = F (4.40) 


This formula has degree of precision 3, that is, it produces the exact result for every poly- 
nomial of degree 3 or less. 


Legendre Polynomials 


The technique we have described could be used to determine the nodes and coefficients for 
formulas that give exact results for higher-degree polynomials, but an alternative method 
obtains them more easily. In Sections 8.2 and 8.3 we will consider various collections of 
orthogonal polynomials, functions that have the property that a particular definite integral 
of the product of any two of them is 0. The set that is relevant to our problem is the Legendre 
polynomials, a collection {Po(x), Pi(x),...,Pn(x),...,} with properties: 


(1) For each n, P,,(x) is a monic polynomial of degree n. 
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Recall that monic polynomials 
have leading coefficient 1. 


Adrien-Marie Legendre 
(1752-1833) introduced this set 
of polynomials in 1785. He had 
numerous priority disputes with 
Gauss, primarily due to Gauss’ 
failure to publish many of his 
original results until long after he 
had discovered them. 


Theorem 4.7 


47 Gaussian Quadrature 231 


1 
(2) / P(x)P;,(x) dx = 0 whenever P(x) is a polynomial of degree less than n. 
-1 


The first few Legendre polynomials are 


1 
Pow =1, Pi@)=x, Py) =x — 5, 
3 6, 3 
P3(x) =x° — 5% and = P4(x) = x* = ges 


The roots of these polynomials are distinct, lie in the interval (—1, 1), have a symmetry 
with respect to the origin, and, most importantly, are the correct choice for determining the 
parameters that give us the nodes and coefficients for our quadrature method. 

The nodes x1,x2,...,%, needed to produce an integral approximation formula that 
gives exact results for any polynomial of degree less than 27 are the roots of the nth-degree 
Legendre polynomial. This is established by the following result. 


Suppose that x;,x2,...,x, are the roots of the nth Legendre polynomial P,,(x) and that for 


eachi = 1,2,...,n, the numbers c; are defined by 
1 2” 
Xx — Xj 
C= I] dx 
=| = Xie Xj 
j#i 


If P(x) is any polynomial of degree less than 2n, then 
1 n 
/ P(x) dx =) cP). a 
4 i=l 


Proof Letus first consider the situation for a polynomial P(x) of degree less than n. Rewrite 
P(x) in terms of (n — 1)st Lagrange coefficient polynomials with nodes at the roots of the 
nth Legendre polynomial P,,(x). The error term for this representation involves the nth 
derivative of P(x). Since P(x) is of degree less than n, the nth derivative of P(x) is 0, and 
this representation of is exact. So 


P(x) = J> PUxLilx) = YT ae Pai 


i=1 i=1 j= :" 
J#i 
and 
1 non 
x Xj 
/ P(x) dx = / YT] P(x;) | dx 
1 -1) 4a] A Xj Xj 
J#i 
n 17” aang n 
= > I] ! dx | P(x) = Y> ciP Gi). 
a = laeteae Xi xX _ 
i=1 j=l i=1 
J#i 


Hence the result is true for polynomials of degree less than n. 

Now consider a polynomial P(x) of degree at least n but less than 2n. Divide P(x) by 
the nth Legendre polynomial P,,(x). This gives two polynomials Q(x) and R(x), each of 
degree less than n, with 


P(x) = Q(x)Pn(x) + R(x). 
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CHAPTER 4 


Table 4.12 


Example 1 


Numerical Differentiation and Integration 


Note that x; is a root of P,,(x) for each i = 1,2,...,n, so we have 
P(xi) = Q(%)) Pai) + R(X) = RO). 


We now invoke the unique power of the Legendre polynomials. First, the degree of the 
polynomial Q(x) is less than n, so (by Legendre property (2)), 


1 
Q(x)Pn(x) dx = 0. 
1 


Then, since R(x) is a polynomial of degree less than n, the opening argument implies that 


1 n 
: R(x) dx = > ciR(x;). 
—1 i=l 


Putting these facts together verifies that the formula is exact for the polynomial P(x): 


1 1 1 n n 
/ P(x) dx = i [O(x)P n(x) + R(x)] dx = if RQ) dx =) GR@) = PG). 
ca = = i=l i=l 


The constants c; needed for the quadrature rule can be generated from the equation 
in Theorem 4.7, but both these constants and the roots of the Legendre polynomials are 
extensively tabulated. Table 4.12 lists these values for n = 2,3,4, and 5. 


n Roots 7; Coefficients c,,; 
2, 0.5773502692 1.0000000000 
—0.5773502692 1.0000000000 

3 0.7745966692 0.5555555556 
0.0000000000 0.8888888889 
—0.7745966692 0.5555555556 

4 0.8611363116 0.347854845 1 
0.33998 10436 0.6521451549 
—0.33998 10436 0.6521451549 
—0.8611363116 0.347854845 1 

5 0.9061798459 0.2369268850 
0.5384693101 0.4786286705 
0.0000000000 0.5688888889 
—0.5384693101 0.4786286705 
—0.9061798459 0.2369268850 


Approximate i e* cos x dx using Gaussian quadrature with n = 3. 
Solution The entries in Table 4.12 give us 


1 
/ e cos x dx & 0.597746 cos 0.774596692 
-1 


+ 0.8cos 0 + 0.5e7 077499? cos(—0.774596692) 
= 1.9333904. 


Integration by parts can be used to show that the true value of the integral is 1.9334214, so 
the absolute error is less than 3.2 x 107°. a 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


47 Gaussian Quadrature 233 


Gaussian Quadrature on Arbitrary Intervals 


An integral He J (x) dx over an arbitrary [a,b] can be transformed into an integral over 
[—1, 1] by using the change of variables (see Figure 4.17): 


2x-—a—b 1 
t = ——— <> x= -[(b-—a)t+a+D)]. 
b-—a 2 


Figure 4.17 


This permits Gaussian quadrature to be applied to any interval [a, b], because 


b 1 -_ = 
i: f@ d= | f (f ore cas 3) @ ; D5 (4.41) 
a —l1 


3 
Example 2 Consider the integral / x° — x” sin(2x) dx = 317.3442466. 
1 


(a) Compare the results for the closed Newton-Cotes formula with n = 1, the open 
Newton-Cotes formula with n = 1, and Gaussian Quadrature when n = 2. 


(b) Compare the results for the closed Newton-Cotes formula with n = 2, the open 
Newton-Cotes formula with n = 2, and Gaussian Quadrature when n = 3. 


Solution (a) Each of the formulas in this part requires 2 evaluations of the function f(x) = 


x® — x? sin(2x). The Newton-Cotes approximations are 


2 
Closedn = 1: 5 [ fC) + f@)] = 731.6054420; 


Openn =1: 2 [f (5/3) + f(7/3)] = 188.7856682. 


Gaussian quadrature applied to this problem requires that the integral first be transformed 
into a problem whose interval of integration is [—1, 1]. Using Eq. (4.41) gives 


3 1 
‘f x® — x? sin(2x) dx = ; (4-2) = 4-2) sn 4 o)) de 
1 -1 
Gaussian quadrature with n = 2 then gives 


3 
/ x° — x? sin(2x) dx © f (—0.5773502692 + 2) + f (0.5773502692 +2) = 306.8199344; 
1 
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(b) Each of the formulas in this part requires 3 function evaluations. The Newton-Cotes 
approximations are 


Closed n = 2: ° (fC) +4f(2) + f(3)] = 333.2380940; 


4(1/2) 
3 


Openn =2: [2f(1.5) — f(2) + 2f(2.5)] = 303.5912023. 


Gaussian quadrature with n = 3, once the transformation has been done, gives 
3 
i x° — x” sin(2x) dx © 0.5 f (—0.7745966692 + 2) + 0.8 f (2) 
1 


+ 0.5, (0.7745966692 + 2) = 317.2641516. 


The Gaussian quadrature results are clearly superior in each instance. a 


Maple has Composite Gaussian Quadrature in the NumericalAnalysis subpackage of 
Maple’s Student package. The default for the number of partitions in the command is 10, 
so the results in Example 2 would be found for n = 2 with 


f = x® — x? sin(2x);a := 1;b := 3: 
Quadrature (f (x),x = a..b, method = gaussian[2], partition = 1, output = information) 


which returns the approximation, what Maple assumes is the exact value of the integral, the 
absolute, and relative errors in the approximations, and the number of function evaluations. 

The result when n = 3 is, of course, obtained by replacing the statement method = 
gaussian[2] with method = gaussian{[3]. 


EXERCISE SET 47 


1. Approximate the following integrals using Gaussian quadrature with n = 2, and compare your results 
to the exact values of the integrals. 


15 
a. i x? Inx dx 
1 
035° 9 x/4 
c. / dx d. / x’ sinx dx 
0) ve —4 0 


x/4 ia 16 9y 
e. e sin 2x dx f. a dx 
0 1 x27—4 


m/4 
/ (cos x)? dx 
0 


= 
o—_ 

& 

nN 
% 

i] 

« 
> 


> 


3.5 re 
: ———— dx 

I Vx? — 4 

Repeat Exercise 1 with n = 3. 

Repeat Exercise | with n = 4. 

Repeat Exercise 1 withn = 5. 


wa P & 


Determine constants a, b, c, and d that will produce a quadrature formula 


1 
i; f@) dx =af(-)+bf(D+ef'(-Dt+df'W) 
-1 


that has degree of precision 3. 
6. Determine constants a, b, c, and d that will produce a quadrature formula 


1 
/ f(x) dx =af(-l +bfO)+ef()+dfi(-l +ef') 
-1 


that has degree of precision 4. 
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7. Verify the entries for the values of n = 2 and 3 in Table 4.12 on page 232 by finding the roots of the 
respective Legendre polynomials, and use the equations preceding this table to find the coefficients 


associated with the values. 

8. Show that the formula Q(P) = ee , c:P(x;) cannot have degree of precision greater than 2n — 1, 
regardless of the choice of cy,...,C, and x,,...,X,. LHint: Construct a polynomial that has a double 
root at each of the x;’s.] 


9. Apply Maple’s Composite Gaussian Quadrature routine to approximate ii 1 x’e* dx in the following 
manner. 


a. Use Gaussian Quadrature with n = 8 on the single interval [—1, 1]. 

b. | Use Gaussian Quadrature with n = 4 on the intervals [—1, 0] and [0, 1]. 

c. Use Gaussian Quadrature with n = 2 on the intervals [—1, —0.5], [—0.5, 0], [0, 0.5] and [0.5, 1]. 
d Give an explanation for the accuracy of the results. 


J 4.8 Multiple Integrals 


The techniques discussed in the previous sections can be modified for use in the approxi- 
mation of multiple integrals. Consider the double integral 


; f (x, y) dA, 


R 


where R = {(x,y) | a < x < b,c < y < d}, for some constants a, b, c, and d, is a 
rectangular region in the plane. (See Figure 4.18.) 


Figure 4.18 


The following illustration shows how the Composite Trapezoidal rule using two subin- 
tervals in each coordinate direction would be applied to this integral. 


Illustration Writing the double integral as an iterated integral gives 


[froma fi (f° roa) dx. 
R 
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To simplify notation, letk = (d—c)/2andh = (b—a)/2. Apply the Composite Trapezoidal 
rule to the interior integral to obtain 


7 k d 
[ tong 5 [root tora tor (x )]. 


This approximation is of order O ((d = c)°). Then apply the Composite Trapezoidal rule 
again to approximate the integral of this function of x: 


b d b ee 
[Cf fosy)dy) av~ f (G*) [too tar (x F*) + 20] dx 
=? (S*) | reo+ar(aS*) + raa)] 
b-a d-c a+b 
+52 0(S*)[r(4*) 
259 CPM) 


b-afd-c c+d 
+ q ( Z ) [too +24 (0. F*) + re.0)| 


_ 6=a¢=e) 
> 16 


a+b a+b c+d 
+2(s (re) + r(Gra) +(a SZ) 
c+d a+b ct+d 
+4 (.5S°)) +49(S*S*) | 


ic + f(a,d) + f(b,c) + f(b, d) 


This approximation is of order O ((b—a)(d —c)[(b—a)* + (d—c)’]). Figure 4.19 
shows a grid with the number of functional evaluations at each of the nodes used in the 
approximation. 


Figure 4.19 
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As the illustration shows, the procedure is quite straightforward. But the number of 
function evaluations grows with the square of the number required for a single integral. In 
a practical situation we would not expect to use a method as elementary as the Composite 
Trapezoidal rule. Instead we will employ the Composite Simpson’s rule to illustrate the 
general approximation technique, although any other composite formula could be used in 
its place. 

To apply the Composite Simpson’s rule, we divide the region R by partitioning both 
[a, b] and [c,d] into an even number of subintervals. To simplify the notation, we choose 
even integers n and m and partition [a, b] and [c,d] with the evenly spaced mesh points 
X0,X1,---5X, and yo, y1,..-,¥m, respectively. These subdivisions determine step sizes h = 
(b — a)/n and k = (d — c)/m. Writing the double integral as the iterated integral 


i, foyaa= fi (f° rena) dx, 
f 


we first use the Composite Simpson’s rule to approximate 


d 
/ f (x, y) dy, 


treating x as a constant. 
Let yj = c + jk, for each j = 0,1,...,m. Then 


d k (m/2)—1 m/2 
/ PRI =| PVs 00) + 2 Y> FG y2) +4) F@y2j-1) + FO Im) 
© j=l j=l 
d—c)k* a4 
so ae 


180 ayt 7” 
for some yz in (c,d). Thus 


(m/2)—-1 


b d k b b 
[ [ tonaa=5] f fen de+2 of fonsnax 
a € a j=l a 


m/2 


b b 
40° f fonvar+ [ FO. yn) a 
j=l °4 a 


(d —c)k* [ at f 
pe) dx. 
130 J, ay 4) 


Composite Simpson’s rule is now employed on the integrals in this equation. Let x; = a+ih, 
for each i = 0,1,...,n. Then for each j = 0,1,...,m, we have 


b h (n/2)-1 n/2 
/ fayde= 5 [Fon +2 S° fer.y) +495 feri-i.y) + Fos 

a i=1 i=1 
_ o- a)h* a4 f 


180 Oxt (§, yi) 
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for some é; in (a, b). The resulting approximation has the form 


(n/2)-1 


[ft 76 nayae~ S {I fo00)+2 >> fi, yo) 


i=1 


n/2 
+4)° f(ri-1,¥0) + Fons) 


i=1 
(m/2)—-1 (m/2)—1 (n/2)—-1 
Yd f@oy+2 D> So feriys) 
j=l j=l i=1 

(m/2)-1  n/2 (m/2)—1 

4D Vfeainwd+ Do femp| 
j=l i=l j=l 
m/2 m/2 (n/2)—1 

+4] 9° Fo092-0 +42)" YFG y2-1) 
j=l j=l isl 
m/2  n/2 m/2 

+4)° YS FO: 1, Y2j- 1) ps »| 
j=l i=l 


(n/2)-1 n/2 
+ | P09 oF 2 > F (x2. Yn) F 4 > S (X21, Ym) oF Fond) }. 


i=l i=l 
The error term F is given by 


(m/2)—-1 m/2 


at 
E= 7 (0.90) +2 2 3 pee +4 Fs + (E2j-1,92j-1) 


—k(b — a)h*[ a+ f 
540 E 


af fF (d —c)k4 i 
4 Fr mdm) — [= Fw, 1) de. 


If 94 f/dx* is continuous, the Intermediate Value Theorem 1.11 can be repeatedly 
applied to show that the evaluation of the partial derivatives with respect to x can be replaced 
by a common value and that 


_ 7 Fi 4 _ 4 pb 94 
pe kb = ah [3m f am| (d —c)k / a” f ee 
ax4 ~ oy 


540 180 


for some (7, Z) in R. If 4+ f/dy* is also continuous, the Weighted Mean Value Theorem for 
Integrals 1.13 implies that 


b at 
ig vf ee 
[ aa Ot, MW) dx = (b — Daa 7 (0, KL), 
for some (7, jt) in R. Because m = (d — c)/k, the error term has the form 


—k(b — a)h* 4 OF (d—c)b—4) 4 af 
540 maa ee | 180 ay 


E= 7 (i, ft) 


which simplifies to 


d—c)(b- 40° at 
p= am +e G.], 


180 
for some (7, 2) and (7, jv) in R. 
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Example 1 Use Composite Simpson’s rule with n = 4 and m = 2 to approximate 


2.0 pls 
/ i In@x + 2y) dy dx, 
14 J10 


Solution The step sizes for this application are h = (2.0 — 1.4)/4 = 0.15 and k = 
(1.5 — 1.0)/2 = 0.25. The region of integration R is shown in Figure 4.20, together with 
the nodes (x;, y;), where i = 0, 1,2, 3,4 andj = 0, 1, 2. It also shows the coefficients w;; of 
FS (%. yi) = In@; + 2y;) in the sum that gives the Composite Simpson’s rule approximation 
to the integral. 


Figure 4.20 


The approximation is 


2.0 pls Mateo 
0.15) (0.25 
/ / Heap Y= dS wig In@ + 2y;) 
14 J10 9 i=0 j=0 
= 0.4295524387. 
We have 
at f os —6 a*f Bp 
——7 (Xt = na, = 4 Aad 
at” ~ GE Dy) aye = Ge Dy 


and the maximum values of the absolute values of these partial derivatives occur on R when 
x = 1.4and y = 1.0. So the error is bounded by 


2 (0.5) (0.6) 


E 0.25)* 
|E| < 180 + (0.25) 


(0.15)* max 


—— < 4.72 x 10°°. 
(xy)ink (x + 2y)4 = . 


max ————— 
(xy)ink (x + 2y)4 
The actual value of the integral to ten decimal places is 
20 pls 
i . In(x + 2y) dy dx = 0.4295545265, 
14 JL0 

so the approximation is accurate to within 2.1 x 10~°. a 

The same techniques can be applied for the approximation of triple integrals as well as 
higher integrals for functions of more than three variables. The number of functional evalu- 
ations required for the approximation is the product of the number of functional evaluations 


required when the method is applied to each variable. 
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Example 2 


Numerical Differentiation and Integration 


Gaussian Quadrature for Double Integral Approximation 


To reduce the number of functional evaluations, more efficient methods such as Gaussian 
quadrature, Romberg integration, or Adaptive quadrature can be incorporated in place of the 
Newton-Cotes formulas. The following example illustrates the use of Gaussian quadrature 
for the integral considered in Example 1. 


Use Gaussian quadrature with n = 3 in both dimensions to approximate the integral 


2.0 pls 
i / In(x + 2y) dy dx. 
14 J1.0 


Solution Before employing Gaussian quadrature to approximate this integral, we need to 
transform the region of integration 


R={Q,y)|14<2x*<20,10<y< 1.5} 
into 

R={(u,v)|-1 <u<1,-1<v <1}. 
The linear transformations that accomplish this are 


1 1 
= 2x—-14-2.0) and v= 
20-14 y and: V= F510 


u (2y — 1.0— 1.5), 


or, equivalently, x = 0.3u+ 1.7 and y = 0.25v 4+ 1.25. Employing this change of variables 
gives an integral on which Gaussian quadrature can be applied: 


2.0 pls 1 pl 
/ / In(x + 2y) dy dx = 0.075 [ i In(0.3u + 0.5v + 4.2) du du. 
14 J10 -1J-1 


The Gaussian quadrature formula for n = 3 in both u and v requires that we use the nodes 
Wy = Vv =732=0, uo = vo = 13,1 = —0.7745966692, 
and 
uz = V2 = 133 = 0.7745966692. 
The associated weights are c3 = 0.8 and 31 = 33 = 0.5. (These are given in Table 4.12 


on page 232.) The resulting approximation is 


2.0 pls 3. 3 
/ i In(x + 2y) dy dx © 0.075 57S ° c3,¢3; 1n(.373,; + 0.573; + 4.2) 
1.4 1.0 


i=1 j=l 
= 0.4295545313. 
Although this result requires only 9 functional evaluations compared to 15 for the Composite 


Simpson’s rule considered in Example 1, it is accurate to within 4.8 x 10~°, compared to 
2.1 x 107° accuracy in Example 1. a 
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Non-Rectangular Regions 


The use of approximation methods for double integrals is not limited to integrals with 
rectangular regions of integration. The techniques previously discussed can be modified to 
approximate double integrals of the form 


b pd(x) 
/ f (x, y) dy dx (4.42) 
a c(x) 
or 
d phy) 
/ ‘ f(x, y) dx dy. (4.43) 
ce Jaw) 


In fact, integrals on regions not of this type can also be approximated by performing appro- 
priate partitions of the region. (See Exercise 10.) 
To describe the technique involved with approximating an integral in the form 


b  pd(x) 
/ f(x, y) dy dx, 
a c(x) 


we will use the basic Simpson’s rule to integrate with respect to both variables. The 
step size for the variable x is h = (b — a)/2, but the step size for y varies with x (see 
Figure 4.21) and is written 


Figure 4.21 
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This gives 


b d(x) b k(x) 
i Ps f(x, y) dy dx ~ | aq LP se) +4f (x, cx) +k(x)) + f(x, d(x))] dx 


ae qb axel) +4 f(a, c(a) + k(a)) + f(a,d(a))] 


o \s 
3 


4 rat hla thy) +4f (at hela th) 


+k(a+h))+ flath,d(a+h))] 


k(b 
sie LF. c(b)) + 4 f(b, c(b) + k(b)) + f(b, api}, 


Algorithm 4.4 applies the Composite Simpson’s rule to an integral in the form (4.42). 
Integrals in the form (4.43) can, of course, be handled similarly. 


Simpson’s Double Integral 


To approximate the integral 


b pdx) 
= f(x,y) dy dx : 
a c(x) 


INPUT endpoints a, b: even positive integers m, n. 
OUTPUT approximation J to /. 


Step 1 Seth=(b—a)/n; 
J, =0; (End terms.) 
Jy =0; (Even terms.) 
J3 = 0. (Odd terms.) 


Step 2 Fori=0,1,...,n do Steps 3-8. 
Step 3. Setx=a+ih; (Composite Simpson’s method for x.) 
AX = (d(x) — c(x))/m; 
K, = f(@,c@)) + f@,d(x)); (End terms.) 
K,=0; (Even terms.) 
K3;=0. (Odd terms.) 


Step 4 Forj = 1,2,...,m— 1 do Step 5 and 6. 
Step 5 Sety=c(x)+ jHX; 
Q = f(x,y). 
Step 6 Ifj is even then set Ky = Ky + Q 
else set K3 = K34+ Q. 


Step 7 Set L = (K; + 2K2 + 4K3)HX/3. 


d(xj) 
(x ~ i f(x,y) dy by the Composite Simpson’s method 
c(xj) 
Step 8 Ifi=Oori=nthenset J, =J, +L 
else if i is even then set J2 = Jo + L 
else set J3 = J3 4+ L. 
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The reduced calculation makes it 
generally worthwhile to apply 
Gaussian quadrature rather than a 
Simpson’s technique when 


approximating double integrals. 
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Step9 SetJ =h(J + 24. + 443)/3. 


Step 10 OUTPUT (J); 
STOP. : 


To apply Gaussian quadrature to the double integral 


b pd(x) 
| / f(x, y) dy dx, 
a c(x) 


first requires transforming, for each x in [a, b], the variable y in the interval [c(x), d(x)] into 
the variable t in the interval [—1, 1]. This linear transformation gives 


(d(x) — core d(x) + ©) ad d(x) Of) 


fy) = F(s 


Then, for each x in [a, b], we apply Gaussian quadrature to the resulting integral 


d(x) 1 7 
f(x,y) dy = I, 7 (« (d(x) — c(x))t + dQ) + “) % 


e(x) 2 
to produce 
b pd(x) b -_ 7 = F 
: Ga hanans / “or c(x) a #(c (d(x) corn + d(x) + “) Be 
a Jc(x) a A 


j=l 


where, as before, the roots r,; and coefficients c,; come from Table 4.12 on page 232. 
Now the interval [a,b] is transformed to [—1,1], and Gaussian quadrature is applied 
to approximate the integral on the right side of this equation. The details are given in 
Algorithm 4.5. 


Gaussian Double Integral 


b d(x) 
i / f(x, y) dy dx : 
a c(x) 


INPUT endpoints a, b; positive integers m, n. 
(The roots r;; and coefficients c;; need to be available for i = max{m, n} 
and for 1 <j <i.) 


To approximate the integral 


OUTPUT approximation J to /. 


Step 1 Seth, = (b—a)/2; 
hy = (b + a)/2; 
J=0. 


Step 2 Fori=1,2,...,mdo Steps 3-5. 
Step 3 SetJX =0; 


x =hitmi + ha; 
d, = d(x); 
c) = c(x); 


ky = (dj — ¢1)/2; 
ky = (di +.¢1)/2. 
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Step 4 Forj =1,2,...,ndo 
set y = kyryj + ko; 
Q= f(x,y); 
JX = JX + enjQ- 
Step 5 Set J =J + cmik\JX. 
Step 6 SetJ=h,J. 


Step 7 OUTPUT (J); 
STOP. a 


Illustration The volume of the solid in Figure 4.22 is approximated by applying Simpson’s Double 
Integral Algorithm with n = m = 10 to 


0.5 px? 
if / &”* dy dx. 
0.1 x3 


This requires 121 evaluations of the function f(x,y) = e&/* and produces the value 
0.0333054, which approximates the volume of the solid shown in Figure 4.22 to nearly 
seven decimal places. Applying the Gaussian Quadrature Algorithm with n = m = 5 re- 
quires only 25 function evaluations and gives the approximation 0.03330556611, which is 
accurate to 11 decimal places. 


Figure 4.22 


1 (0.1, 0.01, e°') (0.5, 0.25, e°) 


(0.1, 0.001, e°) ¥ 


(0.5, 0.25, 0) 


(0.5, 0.125, 0) 
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Triple Integral Approximation 


Triple integrals of the form 


b pdx) pBxy) 
The reduced calculation makes it i / / f (x, y, z) dz dy dx 
a Cc a 


almost always worthwhile to (x) (x,y) 


apply Gaussian quadrature rather (see Figure 4.23) are approximated in a similar manner. Because of the number of calcu- 


lations involved, Gaussian quadrature is the method of choice. Algorithm 4.6 implements 
this procedure. 


than a Simpson’s technique when 
approximating triple or higher 


integrals. 


Figure 4.23 


| 

| 
\ I) 
\ 

é 

| 

| 

1 


z= a(x, y) 


Gaussian Triple Integral 


To approximate the integral 


b pdx) PBay) 
ii , / Ff (x, y, z) dz dy dx : 
a Jc(x) Ya(xy) 


INPUT endpoints a, b; positive integers m, n, p. 
(The roots r;; and coefficients c;; need to be available for i = max{n, m, p} 
and for 1 <j <i.) 


OUTPUT approximation J to /. 


Step 1 Seth, = (b—a)/2; 
hy = (b + a)/2; 
J=0. 


Step 2 Fori=1,2,...,mdo Steps 3-8. 
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Step 3 Set JX=0; 


X= hry t+ ho; 
d, = d(x); 
cy = c(x); 


ky = (di — 1) /2; 
ky = (di + c1)/2. 
Step 4 Forj = 1,2,...,n do Steps 5-7. 
Step 5 SetJY=0; 
y= kyryj + ko; 
Bi = B(x, y); 
a) = a(x, y); 
L, = (By — a )/2; 
ly = (By +)/2. 
Step 6 Fork =1,2,...,pdo 
set z= [rp4 +h; 


O= f(x,y, 2); 
JY = JV + cpxQ. 


Step 7 Set JX = JX+ c¢, jl, JY. 
Step 8 SettJ=J+ Cm iki JX. 
Step9 SetJ=h,J. 


Step 10 OUTPUT (J); 
STOP. os 


The following example requires the evaluation of four triple integrals. 


Illustration The center of a mass of a solid region D with density function o occurs at 


a ee My, M,, Myy 
anos (FEE -% 
where 
My, ahh xa (x,y,z) dV, m= fff yo (x,y,z) dV 
D D 
and 


My = IIf zo (x,y,z) dV 
D 


are the moments about the coordinate planes and the mass of D is 


M= Iii ocny.2 dV. 
D 


The solid shown in Figure 4.24 is bounded by the upper nappe of the cone z” = x” + y” and 
the plane z = 2. Suppose that this solid has density function given by 


O(%, 9,2) = af? + y*. 
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Figure 4.24 


Applying the Gaussian Triple Integral Algorithm 4.6 with n = m = p = 5 requires 125 
function evaluations per integral and gives the following approximations: 


4— fax 2 
M= [ i Vx? + y? dz dy dx 
f4—x2 D/P? 


2 4x2 2 
~ | [ | Vx? + y? dz dy dx © 837504476, 
a #0 Jetty? 


4— fa 2 
My. =f / x x2 + y? dz dy dx © —5.55111512 x 1071”, 
fry? fry? 


Jee 


M,, = yy x2 + y? dz dy dx © —8.01513675 x 107!”, 
7 Lis x2 af x2+y2 


My -{ (ea Vaal + y? dz dy dx © 13.40038156. 


f2ty? 


This implies that the approximate location of the center of mass is 
(x, y,Z) = (0,0, 1.60003701). 


These integrals are quite easy to evaluate directly. If you do this, you will find that the exact 
center of mass occurs at (0, 0, 1.6). 


Multiple integrals can be evaluated in Maple using the MultInt command in the Multi- 
variateCalculus subpackage of the Student package. For example, to evaluate the multiple 


integral 
4 pxt6 pty? 
‘| / / x+y? +zdzdydx 
2 x-1 —2 
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we first load the package and define the function with 
with(Student[MultivariateCalculus]): f := (x,y, Z) > e+ yr +z 
Then issue the command 

Multiint(f (x, y,z),zZ = —2.4+y?, y=x-—1..x+6, x =2.4) 


which produces the result 


1.995885970 


EXERCISE SET 48 


1. 


Use Algorithm 4.4 with n = m = 4 to approximate the following double integrals, and compare the 
results to the exact answers. 


235 1.4 0.5 0.5 
a. / / xy? dy dx b. / / &~ dy dx 
2.1 1.2) 0 0 
22. 2x 1.5 pa 
c. i (x? + y3) dy dx d. i / (x7 + /y) dy dx 
2 x 1 0 


Find the smallest values for n = m so that Algorithm 4.4 can be used to approximate the integrals in 
Exercise | to within 10~° of the actual value. 


Use Algorithm 4.4 with (i) n = 4, m = 8, (11) n = 8, m = 4, and (iii) n = m = 6 to approximate the 
following double integrals, and compare the results to the exact answers. 


m/4 cosx e Xe 
a. / (2y sin x + cos? x) dy dx b. / / In xy dy dx 
0 ds 1 Ji 


in x 


1 2x 1 2x. 
. / / (x? + y3) dy dx d. / / (y? +.x°) dy dx 
0 x 0 x 


oO 


e. / / cos x dy dx f. / [ cos y dy dx 
0 Jo 0 Jo 
m/4 sinx 1 32/2 2a 
g. / ———_ dy dx h. i (y sinx + x cos y) dy dx 
0 Jo VJ/l—y? -x JO 


Find the smallest values for n = m so that Algorithm 4.4 can be used to approximate the integrals in 
Exercise 3 to within 10~° of the actual value. 


Use Algorithm 4.5 with n = m = 2 to approximate the integrals in Exercise 1, and compare the 
results to those obtained in Exercise 1. 


Find the smallest values of n = m so that Algorithm 4.5 can be used to approximate the integrals in 
Exercise 1 to within 10~°. Do not continue beyond n = m = 5. Compare the number of functional 
evaluations required to the number required in Exercise 2. 

Use Algorithm 4.5 with (i) n = m = 3, (ii) n = 3, m = 4, (ill) n = 4, m = 3, and (iv) n = m= 4 to 
approximate the integrals in Exercise 3. 


Use Algorithm 4.5 with n = m = 5 to approximate the integrals in Exercise 3. Compare the number 
of functional evaluations required to the number required in Exercise 4. 


Use Algorithm 4.4 with n = m = 14 and Algorithm 4.5 with n = m = 4 to approximate 


[fora 


R 


for the region R in the plane bounded by the curves y = x” and y = ./x. 
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11. 


12. 
13. 


14. 
15. 


16. 
17. 
18. 


19. 
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Use Algorithm 4.4 to approximate 


i Vxy + y? dA, 


R 


where R is the region in the plane bounded by the lines x + y = 6, 3y — x = 2, and 3x — y = 2. First 
partition R into two regions R,; and Ry on which Algorithm 4.4 can be applied. Use n = m = 6 on 
both R; and Ro. 
A plane lamina is a thin sheet of continuously distributed mass. If o is a function describing the 
density of a lamina having the shape of a region R in the xy-plane, then the center of the mass of the 
lamina (x, y) is 


Jf xo, y) dA Jf yoy) dA 
R R 


“Tio dA’ > flowy) dA’ 
R R 


Use Algorithm 4.4 with n = m = 14 to find the center of mass of the lamina described by R = 
{(x,y) |0<x< 1,0 <y < V1 — x7} with the density function o(x, y) = ene ty), Compare the 
approximation to the exact result. 

Repeat Exercise 11 using Algorithm 4.5 with n = m = 5. 

The area of the surface described by z = f(x,y) for (x, y) in R is given by 


/ i (LAG P+ LEG. yP +1 aA. 


R 


Use Algorithm 4.4 with n = m = 8 to find an approximation to the area of the surface on the 
hemisphere x” + y? + z? = 9, z > 0 that lies above the region in the plane described by R = { (x,y) | 
O<x<10<y<lI}. 

Repeat Exercise 13 using Algorithm 4.5 with n = m = 4. 


Use Algorithm 4.6 with n = m = p = 2 to approximate the following triple integrals, and compare 
the results to the exact answers. 


1 p2 p05 1 pl py 
a. / / / ett dz dy dx b. / / / yz dz dy dx 
o J1 Jo 0 Jx Jo 
1 x x+y 1 Bd x+y 
Cc. / / / y dz dy dx d. i / / z dz dy dx 
0 Jx2 Jx-y 0 Jx? Jx-y 
1 x yy Zz 1 1 Woy, 
ef ff csin= acayas ff fee aaa 
0 Jo Jo Yy Bs 0 Jo Jax . 


Repeat Exercise 15 using n = m = p = 3. 
Repeat Exercise 15 using n =m =p =4andn=m=p=5S. 


—s 


Use Algorithm 4.6 with n = m = p = 4 to approximate 


/ / if xy sin(yz) dV, 


where S is the solid bounded by the coordinate planes and the planes x = 2, y = 1/2,z = 17/3. 
Compare this approximation to the exact result. 


Use Algorithm 4.6 with n = m = p = 5 to approximate 


Jf new. 


where S is the region in the first octant bounded by the cylinder x? +? = 4, the sphere x7+ y* +z? = 4, 
and the plane x + y + z = 8. How many functional evaluations are required for the approximation? 
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|S 4.9 Improper Integrals 


Improper integrals result when the notion of integration is extended either to an interval 
of integration on which the function is unbounded or to an interval with one or more 
infinite endpoints. In either circumstance, the normal rules of integral approximation must 
be modified. 


Left Endpoint Singularity 


We will first consider the situation when the integrand is unbounded at the left endpoint 
of the interval of integration, as shown in Figure 4.25. In this case we say that f has a 
singularity at the endpoint a. We will then show how other improper integrals can be 
reduced to problems of this form. 


Figure 4.25 


It is shown in calculus that the improper integral with a singularity at the left endpoint, 


i dx 
a (x — a)P’ 


converges if and only if 0 < p < 1, and in this case, we define 


a ay 


b 1 
1 = Pp 
/ —— dy = lim @-a? 
a x=M 1- Dp 


(x — a)P Mat l-p 


1 1 
1 1 

Example 1 Show that the improper integral / —— dx converges but i — dx diverges. 
0 JX 0 x 


Solution For the first integral we have 


1 1 1 —_ 
i —dx = lim x? dy = lim 2x!/?|’ 
0 Vx 


(= 
M>0+ Jy M->0+ x=M 


but the second integral 


1 1 1 
: 283 : _1|x=1 
ms dx = lim x?dx= lim —x7! = 
0 x M+>0+ Jy M->0t+ aS 


is unbounded. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


49 Improper Integrals 251 


If f is a function that can be written in the form 


g(x) 
f@= Ga’ 


where 0 < p < 1 and g is continuous on [a, b], then the improper integral 


[ f(x) dx 


also exists. We will approximate this integral using the Composite Simpson’s rule, provided 
that g € C°[a, b]. In that case, we can construct the fourth Taylor polynomial, P4(x), for g 


about a, 
” m (4) 
Py(x) = ga) + 8'(a(x—a) +2 o «—a?+8 ate ea 42 3 @ —a)', 
and write 
b b b 
g(x) — Pax) / P4(x) 
dx = ——_——- d. ———. dx. 4.44 
[ foram fet | ae ae 
Because P(x) is a polynomial, we can exactly determine the value of 
b 4 g a (k) 
P4(x) i a ie gs @ k+I- 
——— dx = x—a)"? dx — 9 (b— at? (4.45 
a («ay 2), ati = Leet! CP 


This is generally the dominant portion of the approximation, especially when the Taylor 
polynomial P4(x) agrees closely with g(x) throughout the interval [a, b]. 
To approximate the integral of f, we must add to this value the approximation of 


fete 


(x — a)P 


To determine this, we first define 


— ifa<x<b 

G(x) = we _- a 
0, if x =a. 

This gives us a continuous function on [a, b]. In fact, 0 < p < 1 and PY (a) agrees with 
g® (a) for each k = 0, 1, 2, 3, 4, so we have G € C*[a, b]. This implies that the Composite 
Simpson’s rule can be applied to approximate the integral of G on [a,b]. Adding this 
approximation to the value in Eq. (4.45) gives an approximation to the improper integral of 
f on [a, b], within the accuracy of the Composite Simpson’s rule approximation. 


Example 2. Use Composite Simpson’s rule with h = 0.25 to approximate the value of the improper 
integral 


1 x 
e 

— dx. 
I Jx 


Solution The fourth Taylor polynomial for e* about x = 0 is 


2 7 x4 


P. =i aes Exper ere 
4@sltat>+e+a, 
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i ae 
e 
so the dominant portion of the approximation to / ae dx is 
9) Xx 


1 1 
/ P4(x) a= | (0? +24 Lox I spe =") a 
i ale " 2 6 24 


1 


2 1 1 1 
= i axl 3/2 5/2 7/2 9/2 
stim, | Aa Ege Nog age” 


M 
1 


2... 1 
i) & 2.9235450. 
7 3 : =) = 21 = 108 


1 ux 
For the second portion of the approximation to / — dx we need to approximate 
0 


JX 
1 
‘ G(x) dx, where 
0 


1 
— (e — Pa(x)), if O<x <1, 
Ga <1 em 

0, if x=0. 

Table 4.13 Table 4.13 lists the values needed for the Composite Simpson’s rule for this approximation. 
x G@) Using these data and the Composite Simpson’s rule gives 

0.00 0 : 0.25 
0.25 0.0000170 : G(x) dx © 40 + 4(0.0000170) + 2(0.0004013) + 4(0.0026026) + 0.0099485] 
0.50  0.0004013 7 
0.75 0.0026026 = 0.0017691. 
1.00 0.0099485 


Hence 


1 

e 

f — dx © 2.9235450 + 0.0017691 = 2.9253141. 
0 vx 


This result is accurate to within the accuracy of the Composite Simpson’s rule approximation 
for the function G. Because |G (x)| < 1 on [0, 1], the error is bounded by 


ee 
—— (0,25)* = 0.0000217. a 
180 


Right Endpoint Singularity 


To approximate the improper integral with a singularity at the right endpoint, we could 
develop a similar technique but expand in terms of the right endpoint b instead of the left 
endpoint a. Alternatively, we can make the substitution 


z= -—x, dz=-—dx 


to change the improper integral into one of the form 


b —a 
; fx) dx = / F (=z) da, (4.46) 
a —b 


which has its singularity at the left endpoint. Then we can apply the left endpoint singularity 
technique we have already developed. (See Figure 4.26.) 
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Figure 4.26 


An improper integral with a singularity at c, where a < c < J, is treated as the sum of 
improper integrals with endpoint singularities since 


b c b 
[ roa fay a+ | f(@) dx. 


Infinite Singularity 


The other type of improper integral involves infinite limits of integration. The basic integral 
of this type has the form 


for p > 1. This is converted to an integral with left endpoint singularity at 0 by making the 
integration substitution 


Then 


In a similar manner, the variable change t = x7! converts the improper integral 


, J (x) dx into one that has a left endpoint singularity at zero: 
oe) l/a 1 
/ fx) dx = / tf (;) dt. (4.47) 
a 0 
It can now be approximated using a quadrature formula of the type described earlier. 


Example 3 Approximate the value of the improper integral 


a 1 
l= / x /? sin — dx. 
1 x 
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Solution We first make the variable change t = x~!, which converts the infinite singularity 
into one with a left endpoint singularity. Then 


dt =—x dx, so dx=—x’ dt=-——dt, 


and 


X=00 1 t=0 1 —3/2 1 1 
af x? sin dx = | — sint | —~dt =f t!/? sin t dt. 
x=1 x 1 \e P 0 


The fourth Taylor polynomial, P4(t), for sin t about 0 is 


P,(t) =t 3 
4 — 6 2 


so 
sint —t+ zt 


G(t) = r/2 : 
0, ift=0 


ifO0<r<1 


is in C*[0, 1], and we have 


: 1 1 sint—t+ 1h 
I =) rete —e a+ f ey 
0 6 0 ‘2 


! tinge 13 
= 2 3/2 _ J a +f sint—t-+ ¢t a 
3 21 0 0 t1/2 


! sint—t+ rai 
= 0.61904761 + i a dt. 
0 


The result from the Composite Simpson’s rule with n = 16 for the remaining integral is 
0.0014890097. This gives a final approximation of 


I = 0.0014890097 + 0.61904761 = 0.62053661, 


which is accurate to within 4.0 x 1078. | 


EXERCISE SET 49 


1. Use Simpson’s Composite rule and the given values of n to approximate the following improper 


integrals. 
1 1 et 
“14g a= = 
a. x sinx dx, n=4 b. / sd, n=6 
Inx Cos 2x 
x, n=8 d. —— dx, n=6 
| @- D4 9 2A 
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2. Use the Composite Simpson’s rule and the given values of n to approximate the following improper 
integrals. 


a [ a, 6 b A dr 8 
. IX, n= e ———— ax, n= 
0 1—-x 0 V¥x—1)? 


3. Use the transformation t = x~! and then the Composite Simpson’s rule and the given values of n to 
approximate the following improper integrals. 


aan | ae | 
a. ——. dx, =4 b. ——_ dx, =4 
/ eas x, n / Tax x, on 
© cos x Pye 
c. 5 dx, n=6 d. x “sinxdx, n=6 
1 x 1 


4. The improper integral ne Ff (x) dx cannot be converted into an integral with finite limits using the 
substitution t = 1/x because the limit at zero becomes infinite. The problem is resolved by first 
writing [> f (x) dx = i f (x) dx + fP° f(x) dx. Apply this technique to approximate the following 
improper integrals to within 10~°. 


| S 1 
/ ——, dx b. / ——, dx 
o l+x4 >» (+x) 


5. Suppose a body of mass m is traveling vertically upward starting at the surface of the earth. If all 
resistance except gravity is neglected, the escape velocity v is given by 


fee} 
? = 2¢R | cd here z = ~ 
v? = 2g z dz, wherez=—, 
1 R 


R = 3960 miles is the radius of the earth, and g = 0.00609 mi/s’ is the force of gravity at the earth’s 
surface. Approximate the escape velocity v. 

6. The Laguerre polynomials {Lo(x),Zi(x)...} form an orthogonal set on [0,00) and satisfy 
IS e*L,(x)Lj(x) dx = 0, fori A j. (See Section 8.2.) The polynomial L,(x) has n distinct 
ZerOS X;,X,...,X, in [0, oo). Let 


n 
~ x x xj 
Cri = | | e 7 d. 
0 j=l i Mj 
J#i 


Show that the quadrature formula 


/ fe dx = a Crit (Xi) 
0 A 


has degree of precision 2n — 1. (Hint: Follow the steps in the proof of Theorem 4.7.) 

7. The Laguerre polynomials Lo(x) = 1, Liv) = 1 — x, In(x) = x? — 4x + 2, and L3(x) = —x? + 
9x? — 18x + 6 are derived in Exercise 11 of Section 8.2. As shown in Exercise 6, these polynomials 
are useful in approximating integrals of the form 


[ ef) d=0. 
0 


a. Derive the quadrature formula using n = 2 and the zeros of Lp (x). 
b. Derive the quadrature formula using n = 3 and the zeros of L3(x). 


8. Use the quadrature formulas derived in Exercise 7 to approximate the integral 


loo) 
/ /xe~* dx. 
0 


9. Use the quadrature formulas derived in Exercise 7 to approximate the integral 
| 
/ 7a46<u2 dx. 
0 lL+x 
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CHAPTER 4 


Numerical Differentiation and Integration 


| Sa 4.10 Survey of Methods and Software 


In this chapter we considered approximating integrals of functions of one, two, or three 
variables, and approximating the derivatives of a function of a single real variable. 

The Midpoint rule, Trapezoidal rule, and Simpson’s rule were studied to introduce the 
techniques and error analysis of quadrature methods. Composite Simpson’s rule is easy to 
use and produces accurate approximations unless the function oscillates in a subinterval 
of the interval of integration. Adaptive quadrature can be used if the function is suspected 
of oscillatory behavior. To minimize the number of nodes while maintaining accuracy, we 
used Gaussian quadrature. Romberg integration was introduced to take advantage of the 
easily applied Composite Trapezoidal rule and extrapolation. 

Most software for integrating a function of a single real variable is based either on the 
adaptive approach or extremely accurate Gaussian formulas. Cautious Romberg integration 
is an adaptive technique that includes a check to make sure that the integrand is smoothly 
behaved over subintervals of the integral of integration. This method has been successfully 
used in software libraries. Multiple integrals are generally approximated by extending good 
adaptive methods to higher dimensions. Gaussian-type quadrature is also recommended to 
decrease the number of function evaluations. 

The main routines in both the IMSL and NAG Libraries are based on QUADPACK: 
A Subroutine Package for Automatic Integration by R. Piessens, E. de Doncker-Kapenga, 
C. W. Uberhuber, and D. K. Kahaner published by Springer-Verlag in 1983 [PDUK]. 

The IMSL Library contains an adaptive integration scheme based on the 21-point 
Gaussian-Kronrod rule using the 10-point Gaussian rule for error estimation. The Gaussian 
rule uses the ten points x;,...,X19 and weights w1,..., Wio to give the quadrature formula 
gen w;f (xj) to approximate ii Ff (x) dx. The additional points x1;,...,x21, and the new 
weights v;,...,V2,, are then used in the Kronrod formula ye , Uif (4). The results of the 
two formulas are compared to eliminate error. The advantage in using x1,...,X19 in each 
formula is that f needs to be evaluated only at 21 points. If independent 10- and 21-point 
Gaussian rules were used, 31 function evaluations would be needed. This procedure permits 
endpoint singularities in the integrand. 

Other IMSL subroutines allow for endpoint singularities, user-specified singularities, 
and infinite intervals of integration. In addition, there are routines for applying Gauss- 
Kronrod rules to integrate a function of two variables, and a routine to use Gaussian quadra- 
ture to integrate a function of n variables over n intervals of the form [a;, bj]. 

The NAG Library includes a routine to compute the integral of f over the interval 
[a, b] using an adaptive method based on Gaussian Quadrature using Gauss 10-point and 
Kronrod 21-point rules. It also has a routine to approximate an integral using a family of 
Gaussian-type formulas based on 1, 3, 5,7, 15,31, 63, 127, and 255 nodes. These interlacing 
high-precision rules are due to Patterson [Pat] and are used in an adaptive manner. NAG 
includes many other subroutines for approximating integrals. 

MATLAB has a routine to approximate a definite integral using an adaptive Simpson’s 
rule, and another to approximate the definite integral using an adaptive eight-panel Newton- 
Cotes rule. 

Although numerical differentiation is unstable, derivative approximation formulas are 
needed for solving differential equations. The NAG Library includes a subroutine for the 
numerical differentiation of a function of one real variable with differentiation to the four- 
teenth derivative being possible. IMSL has a function that uses an adaptive change in step 
size for finite differences to approximate the first, second, or third, derivative of f at x to 
within a given tolerance. IMSL also includes a subroutine to compute the derivatives of a 
function defined on a set of points using quadratic interpolation. Both packages allow the 
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differentiation and integration of interpolatory cubic splines constructed by the subroutines 
mentioned in Section 3.5. 

For further reading on numerical integration we recommend the books by Engels [E] 
and by Davis and Rabinowitz [DR]. For more information on Gaussian quadrature see 
Stroud and Secrest [StS]. Books on multiple integrals include those by Stroud [Stro] and 
by Sloan and Joe [SJ]. 
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Initial-Value Problems 
for Ordinary Differential Equations 


Introduction 


The motion of a swinging pendulum under certain simplifying assumptions is described by 
the second-order differential equation 


eo eg. 
a pe 
L 
(e) 


& 


where L is the length of the pendulum, g * 32.17 ft/s? is the gravitational constant of the 
earth, and @ is the angle the pendulum makes with the vertical. If, in addition, we specify 
the position of the pendulum when the motion begins, @(f9) = 60, and its velocity at that 
point, 6’(t) = 6}, we have what is called an initial-value problem. 

For small values of 6, the approximation 6 ~ sin 6 can be used to simplify this problem 
to the linear initial-value problem 


= + 70=0, A(to) = 6%, 4"(to) = %. 


This problem can be solved by a standard differential-equation technique. For larger values 
of 6, the assumption that 9 = sin @ is not reasonable so approximation methods must be 
used. A problem of this type is considered in Exercise 8 of Section 5.9. 

Any textbook on ordinary differential equations details a number of methods for ex- 
plicitly finding solutions to first-order initial-value problems. In practice, however, few of 
the problems originating from the study of physical phenomena can be solved exactly. 
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CHAPTER 5 


Initial-Value Problems for Ordinary Differential Equations 


The first part of this chapter is concerned with approximating the solution y(f) to a 
problem of the form 
dy 
—=f(t,y) fora<t<b, 
on 
subject to an initial condition y(a) = a. Later in the chapter we deal with the extension of 
these methods to a system of first-order differential equations in the form 


dy, 

= = Sid, Yi, Y2.--- »Yn)> 
dy2 

eee t, Vis Y2.---sYn)> 
7 frat, yi, y2 Yn) 
d 

— = Sr, Yio Y2.--- »Yn)> 


for a < t < b, subject to the initial conditions 


yi(a) =a, yo(a)=a, ..., W(a)=%y. 


We also examine the relationship of a system of this type to the general nth-order initial- 
value problem of the form 


-1 
= FER Fc), 
for a < t < b, subject to the initial conditions 


ya)=aj, y(a)=a, ..., y '@=a. 


| Sa 5.1 The Elementary Theory of Initial-Value Problems 


Differential equations are used to model problems in science and engineering that involve 
the change of some variable with respect to another. Most of these problems require the 
solution of an initial-value problem, that is, the solution to a differential equation that 
satisfies a given initial condition. 

In common real-life situations, the differential equation that models the problem is too 
complicated to solve exactly, and one of two approaches is taken to approximate the solution. 
The first approach is to modify the problem by simplifying the differential equation to one 
that can be solved exactly and then use the solution of the simplified equation to approximate 
the solution to the original problem. The other approach, which we will examine in this 
chapter, uses methods for approximating the solution of the original problem. This is the 
approach that is most commonly taken because the approximation methods give more 
accurate results and realistic error information. 

The methods that we consider in this chapter do not produce a continuous approxima- 
tion to the solution of the initial-value problem. Rather, approximations are found at certain 
specified, and often equally spaced, points. Some method of interpolation, commonly Her- 
mite, is used if intermediate values are needed. 

We need some definitions and results from the theory of ordinary differential equations 
before considering methods for approximating the solutions to initial-value problems. 
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Definition 5.1 


Example 1 


Definition 5.2 


Figure 5.1 


Theorem 5.3 


Rudolf Lipschitz (1832-1903) 
worked in many branches of 
mathematics, including number 
theory, Fourier series, differential 
equations, analytical mechanics, 
and potential theory. He is best 
known for this generalization of 
the work of Augustin-Louis 
Cauchy (1789-1857) and 
Guiseppe Peano (1856-1932). 
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A function f (t, y) is said to satisfy a Lipschitz condition in the variable y on a set D C R?* 
if aconstant L > 0 exists with 


If(t,y) — ft,y2,)| < Lily — yal, 


whenever (ft, y;) and (ft, y2) are in D. The constant L is called a Lipschitz constant for f. 
| 


Show that f(t, y) = t|y| satisfies a Lipschitz condition on the interval D = {(t,y) | 1 < 
t<2and —3<y< 4b}. 


Solution For each pair of points (t, y,) and (f, y2) in D we have 


If(y1) — ft. y2)1 = ltl yl — tl yall = Itlll yi] — Lyall S 2191 — yal. 


Thus f satisfies a Lipschitz condition on D in the variable y with Lipschitz constant 2. The 
smallest value possible for the Lipschitz constant for this problem is L = 2, because, for 
example, 


If(2,1) — f2,0)| = |2 — 0] = 2|1 — O}. a 


A set D C R? is said to be convex if whenever (t1,¥1) and (t2,y2) belong to D, then 
(1 —A)ty + At, A — A)y1 + Ay2) also belongs to D for every 4 in [0, 1]. | 


In geometric terms, Definition 5.2 states that a set is convex provided that whenever 
two points belong to the set, the entire straight-line segment between the points also belongs 
to the set. (See Figure 5.1.) The sets we consider in this chapter are generally of the form 
D= {(t,y) |a < t < band — oc < y < oo} for some constants a and b. It is easy to verify 
(see Exercise 7) that these sets are convex. 


Convex Not convex 


Suppose f(t, y) is defined on a convex set D C R?. If a constant L > 0 exists with 


<L, forall (t,y) €D, (5.1) 


of 
—(t, 
| ay (t,y) 
then f satisfies a Lipschitz condition on D in the variable y with Lipschitz constant L. um 


The proof of Theorem 5.3 is discussed in Exercise 6; it is similar to the proof of the 
corresponding result for functions of one variable discussed in Exercise 27 of Section 1.1. 
As the next theorem will show, it is often of significant interest to determine whether 
the function involved in an initial-value problem satisfies a Lipschitz condition in its second 
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CHAPTER 5 « 


Theorem 5.4 


Example 2 


Initial-Value Problems for Ordinary Differential Equations 


variable, and condition (5.1) is generally easier to apply than the definition. We should 
note, however, that Theorem 5.3 gives only sufficient conditions for a Lipschitz condition 
to hold. The function in Example 1, for instance, satisfies a Lipschitz condition, but the 
partial derivative with respect to y does not exist when y = 0. 

The following theorem is a version of the fundamental existence and uniqueness the- 
orem for first-order ordinary differential equations. Although the theorem can be proved 
with the hypothesis reduced somewhat, this form of the theorem is sufficient for our pur- 
poses. (The proof of the theorem, in approximately this form, can be found in [BiR], 
pp. 142-155.) 


Suppose that D = {(t,y) |a < t < band — oo < y < oo} and that f(¢, y) is continuous on 
D.If f satisfies a Lipschitz condition on D in the variable y, then the initial-value problem 


¥YO=uafay, was1sb. y@) =e, 


has a unique solution y(t) fora < t < b. | 


Use Theorem 5.4 to show that there is a unique solution to the initial-value problem 
y =1+tsin(ty), O<t<2, yO)=0. 
Solution Holding ¢ constant and applying the Mean Value Theorem to the function 
fy) = 1+ tsin(ty), 
we find that when y; < y2, a number & in (yj, y2) exists with 


f(t, y2) _ f(tiy1) = 0 
ya Vi dy 


f(t,€) =? cos(ét). 
Thus 


If(t.y2) — f(t,y)1 = ly — yillt? cos(Et)| < 4ly2 — yil, 


and f satisfies a Lipschitz condition in the variable y with Lipschitz constant L = 4. 
Additionally, f(t, ) is continuous when 0 < t < 2 and —oo < y < &, so Theorem 5.4 
implies that a unique solution exists to this initial-value problem. 

If you have completed a course in differential equations you might try to find the exact 
solution to this problem. a 


Well-Posed Problems 


Now that we have, to some extent, taken care of the question of when initial-value prob- 
lems have unique solutions, we can move to the second important consideration when 
approximating the solution to an initial-value problem. Initial-value problems obtained by 
observing physical phenomena generally only approximate the true situation, so we need 
to know whether small changes in the statement of the problem introduce correspondingly 
small changes in the solution. This is also important because of the introduction of round-off 
error when numerical methods are used. That is, 


© Question: How do we determine whether a particular problem has the property that small 
changes, or perturbations, in the statement of the problem introduce correspondingly 


small changes in the solution? 


As usual, we first need to give a workable definition to express this concept. 
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Definition 5.5 The initial-value problem 


O = fey), ast<b, y(a)=a, (5.2) 


is said to be a well-posed problem if: 


e A unique solution, y(t), to the problem exists, and 


e There exist constants ¢9 > 0 and k > O such that for any ¢, with e¢9 > € > 0, 
whenever 6(f) is continuous with |6(t)| < ¢ for all t in [a, b], and when |d9| < e, the 
initial-value problem 


dz 
a f@2+60), axt<b, zea=a+t dp, (5.3) 
has a unique solution z(f) that satisfies 


|z(t) — y(t)| < ke for all ¢ in [a, D]. | 


The problem specified by (5.3) is called a perturbed problem associated with the 
original problem (5.2). Itassumes the possibility of an error being introduced in the statement 
of the differential equation, as well as an error 59 being present in the initial condition. 

Numerical methods will always be concerned with solving a perturbed problem because 
any round-off error introduced in the representation perturbs the original problem. Unless 
the original problem is well-posed, there is little reason to expect that the numerical solution 
to a perturbed problem will accurately approximate the solution to the original problem. 

The following theorem specifies conditions that ensure that an initial-value problem is 
well-posed. The proof of this theorem can be found in [BiR], pp. 142-147. 


Theorem 5.6 Suppose D = {(t,y) | a < t < band —c@ < y < oo}. If f is continuous and satisfies a 
Lipschitz condition in the variable y on the set D, then the initial-value problem 


dy 
a Ty» a<t<b, y(a) =a 


is well-posed. a 


Example 3 Show that the initial-value problem 


d 
Say P+, 0<1<2, y(0)=05. (5.4) 
is well posed on D = {(t, y) |0<t<2and —oo <y < o}. 


Solution Because 


2 
a = |1|=1, 
dy 
Theorem 5.3 implies that f(t, y) = y — 1? + 1 satisfies a Lipschitz condition in y on D with 
Lipschitz constant 1. Since f is continuous on D, Theorem 5.6 implies that the problem is 
well-posed. 
As an illustration, consider the solution to the perturbed problem 


F 
BmE~ Pe +1+8, 0<1<2, 20)=05+%, (5.5) 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


264 CHAPTER 5 


Maple reserves the letter D to 
represent differentiation. 


Initial-Value Problems for Ordinary Differential Equations 


where 6 and do are constants. The solutions to Eqs. (5.4) and (5.5) are 
y(t) = (t+ 1)? —0.5e’ and z(t) = (t+ 1)? + (6 + bp — 0.5)e' — 4, 


respectively. 
Suppose that ¢ is a positive number. If |5| < ¢ and |d9| < ¢, then 


ly(t) — z(t)| = |(6 + do)e’ — 4] < |b + Sole? + [5] < (2e* + le, 


for all t. This implies that problem (5.4) is well-posed with k(¢) = 2e? + 1 foralle > 0. 
a 


Maple can be used to solve many initial-value problems. Consider the problem 


To define the differential equation and initial condition, enter 
deg := D(y)(t) = y(t) — 2 + 13 init := y(0) = 0.5 


The names deg and init have been chosen by the user. The command to solve the initial-value 
problems is 


deqsol := dsolve ({deq, init}, y(t)) 
and Maple responds with 


1 
yi) = 140 +24 - Fe! 


To use the solution to obtain a specific value, such as y(1.5), we enter 
q := rhs(deqsol) : evalf(subs(t = 1.5,q)) 
which gives 

4.009155465 


The function rhs (for right hand side) is used to assign the solution of the initial-value 
problem to the function g, which we then evaluate at tf = 1.5. 

The function dsolve can fail if an explicit solution to the initial-value problem cannot 
be found. For example, for the initial-value problem given in Example 2, the command 


deqsol2 := dsolve ({D(y)(t) = 14+ ¢- sin(t- y(t)), y(O) = O}, y(@)) 


does not succeed because an explicit solution cannot be found. In this case a numerical 
method must be used. 


EXERCISE SET 5.1 


1. 


Use Theorem 5.4 to show that each of the following initial-value problems has a unique solution, and 
find the solution. 


a. y=ycost, Ox<r<1, yO=1. 


2 
by’ pte, 1<t<2, yd)=0. 


2 
c. Pea yece, 1<t<2, y)=v2e. 


a ga22 1, peOy= 4 
Tae O<sr<l, yO=HL. 
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2. Show that each of the following initial-value problems has a unique solution and find the solution. 
Can Theorem 5.4 be applied in each case? 


a y=e, Ox<r<l, yO=Hl. 
b y=f(sin2t—2ty), 1<t<2, yl) =2. 
«e y=-ytp!?, 2<1t<3, y2)=2. 


ga sere Oa 
ty +t 


3. For each choice of f(t, y) given in parts (a)—(d): 
i. Does f satisfy a Lipschitz condition on D = {(t,y) |0 < t < 1, -—0oo < y < ow}? 
ii. Can Theorem 5.6 be used to show that the initial-value problem 


y=f@y), Osrtsl, yO=H], 
is well-posed? 


4t 
a f(ty=ryt+l bo fayay « f(ys=l-y d aaa 
4. For each choice of f(t, y) given in parts (a)—(d): 
i. Does f satisfy a Lipschitz condition on D = {(t,y) |0<t < 1, -0o < y < oo}? 
ii. Can Theorem 5.6 be used to show that the initial-value problem 


y=fty), O<t<1l, yO=1, 


is well-posed? 
ies, y 
a. ty) =e b. ty)=— « t,y) = cos(yt) d. ty)= 
fit,y) =e f(y) Tae f(t,y) = cos(yt) f@y) ia 
5. For the following initial-value problems, show that the given equation implicitly defines a solution. 
Approximate y(2) using Newton’s method. 


3 
yi+y 3 
a y=—-——_, 1K<tK<2, HN=1 yrtyt=2 
y Gy? + De y() ytty 
j ycost + 2te” . Bias 
bh. y= 1<r<2, yd)=0; ysnt+re'+2y=1 


~ sint + Pe +2’ 
6. Prove Theorem 5.3 by applying the Mean Value Theorem 1,8 to f(t, y), holding t fixed. 
7. Show that, for any constants a and b, the set D = {(t,y) |a < t < b, —co < y < ov} is convex. 


8. Suppose the perturbation 5(f) is proportional to f, that is, (¢) = dt for some constant 5. Show directly 
that the following initial-value problems are well-posed. 


a y=l-y, 0<t<2, yO)=0 
bh y=t+y, O<t<2, yO=-1 


2 
c. doer a 1<r<2, yd)=0 


2 
a. y=—tytPe, 1<1<2, yl) = Ve 


9. Picard’s method for solving the initial-value problem 
y=fy), as<t<b, ya=a, 
is described as follows: Let yo(t) = @ for each t in [a, b]. Define a sequence {y,(t)} of functions by 
: 
y(t) = a+ f S(t, ye-1(t)) dt, k=1,2,.... 
a. Integrate y’ = f(t, y(1)), and use the initial condition to derive Picard’s method. 
b. Generate yo(t), y; (4), y2(t), and y3(t) for the initial-value problem 
y=-ytrt+l, O<r<1, yOH=H1. 


c. Compare the result in part (b) to the Maclaurin series of the actual solution y(t) =t+e™. 
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| = 5.2 Euler's Method 


Euler’s method is the most elementary approximation technique for solving initial-value 
problems. Although it is seldom used in practice, the simplicity of its derivation can be 
used to illustrate the techniques involved in the construction of some of the more advanced 
techniques, without the cumbersome algebra that accompanies these constructions. 

The object of Euler’s method is to obtain approximations to the well-posed initial-value 
problem 


—=ft.y), ast<b, yla)=a. (5.6) 


A continuous approximation to the solution y(t) will not be obtained; instead, approx- 
imations to y will be generated at various values, called mesh points, in the interval [a, b]. 
Once the approximate solution is obtained at the points, the approximate solution at other 
points in the interval can be found by interpolation. 

We first make the stipulation that the mesh points are equally distributed throughout 
the interval [a, b]. This condition is ensured by choosing a positive integer N and selecting 
the mesh points 


tp =at+ih, foreachi=0,1,2,...,N. 


The common distance between the points h = (b—a)/N = tj;4, —t; is called the step size. 


The use of elementary difference We will use Taylor’s Theorem to derive Euler’s method. Suppose that y(t), the unique 
methods to approximate the solution to (5.6), has two continuous derivatives on [a, b], so that for each i = 0,1,2,..., 
solution to differential equations N—-1, 

was one of the numerous 

mathematical topics that was first ; (tit ~ t;)" - 

presented to the mathematical yin) = ti) + (in — HY (ti) + a, rs (&i), 


public by the most prolific of 
mathematicians, Leonhard Euler 


for some number €; in (¢;, t;41). Because h = tj, — t;, we have 
(1707-1783). 


h2 
y(ti41) = y(t) + hy’ (ti) + zy Gi 


and, because y(f) satisfies the differential equation (5.6), 


h2 
y(tit1) = ti) HAS (ti, 9G) + zy i. (5.7) 


Euler’s method constructs w; © y(t;), foreach i = 1,2,...,N, by deleting the remain- 
der term. Thus Euler’s method is 


Wo = a, 
Win =Withf,w;), foreachi=0,1,...,.N—1. (5.8) 
IIlustration In Example | we will use an algorithm for Euler’s method to approximate the solution to 
y=y-P+l, O0<t<2, y0)=05, 


at t = 2. Here we will simply illustrate the steps in the technique when we have h = 0.5. 
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For this problem f(t, y) = y — t? + 1, so 
wo = y(0) = 0.5; 
wy = wo + 0.5 (wo — (0.0)? + 1) = 0.5 + 0.5(1.5) = 1.25; 
w2 = w, + 0.5 (w; — (0.5)? + 1) = 1.25 + 0.5(2.0) = 2.25; 
w3 = w2 + 0.5 (w2 — (1.0)? + 1) = 2.25 + 0.5(2.25) = 3.375; 


and 


y(2) © wa = w3 + 0.5 (w3 — (1.5)? + 1) = 3.375 + 0.5(2.125) = 4.4375. 


Equation (5.8) is called the difference equation associated with Euler’s method. As 
we will see later in this chapter, the theory and solution of difference equations parallel, 
in many ways, the theory and solution of differential equations. Algorithm 5.1 implements 
Euler’s method. 


Euler’s 


To approximate the solution of the initial-value problem 
y=f(y), ast<b, y(a)=a, 
at (NV + 1) equally spaced numbers in the interval [a, b]: 


INPUT endpoints a, b; integer N; initial condition a. 
OUTPUT approximation w to y at the (NV + 1) values of f. 
Step 1 Seth=(b—a)/N; 


t=a; 
w=a; 
OUTPUT (t, w). 


Step 2 Fori=1,2,...,N do Steps 3, 4. 


Step3 Setw=wthf(t,w); (Compute w;.) 
t=a-+ih. (Compute t;.) 


Step 4 OUTPUT (t, w). 
Step 5 STOP. a 


To interpret Euler’s method geometrically, note that when w; is a close approximation 
to y(t;), the assumption that the problem is well-posed implies that 


f(t, wi) © YG) = fi, ¥(@)). 
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The graph of the function highlighting y(¢;) is shown in Figure 5.2. One step in Euler’s 
method appears in Figure 5.3, and a series of steps appears in Figure 5.4. 


Figure 5.2 


Yh 


vty) =y(b) tO = FY), 
y(a) = a 


Figure 5.3 


y' =fGy), 
ya) = a 


Slope y'(a) = f(a, @) 


Example 1 Euler’s method was used in the first illustration with h = 0.5 to approximate the solution 
to the initial-value problem 


y=y-fPl, OK<t<2, y0)=05. 


Use Algorithm 5.1 with NV = 10 to determine approximations, and compare these with the 
exact values given by y(t) = (t+ 1)? — 0.5e'. 


Solution With N = 10 we have h = 0.2, t; = 0.27, wo = 0.5, and 
Win = Wi + h(w; — t? +1) = w; + 0.2[w; — 0.047 + 1] = 1.2w; — 0.0087? + 0.2, 
fori =0,1,...,9. So 
w, = 1.20.5) — 0.008(0)* + 0.2 = 0.8; wz = 1.2(0.8) — 0.008(1)* + 0.2 = 1.152; 


and so on. Table 5.1 shows the comparison between the approximate values at f; and the 
actual values. | 
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Table 5.1 t; w; yi = y(ti) lyi — wil 


0.0 0.5000000 0.5000000 0.0000000 
0.2 0.8000000 0.8292986 0.0292986 


0.4 1.1520000 1.2140877 0.0620877 
0.6 1.5504000 1.6489406 0.0985406 
0.8 1.9884800 2.1272295 0.1387495 
1.0 2.4581760 2.640859 1 0.182683 1 


1.2 2.9498 112 3.1799415 0.2301303 
1.4 3.4517734 3.7324000 0.2806266 
1.6 3.9501281 4.2834838 0.3333557 
1.8 4.4281538 4.8151763 0.3870225 
2.0 4.8657845 5.3054720 0.4396874 


Note that the error grows slightly as the value of ¢ increases. This controlled error 
growth is a consequence of the stability of Euler’s method, which implies that the error is 
expected to grow in no worse than a linear manner. 

Maple has implemented Euler’s method as an option with the command Initial- 
ValueProblem within the NumericalAnalysis subpackage of the Student package. To use 
it for the problem in Example 1| first load the package and the differential equation. 


with(Student{NumericalAnalysis]): deq := diff(y(t),t) = y(t) —t?? +1 
Then issue the command 


C := InitialValueProblem(deq, y(O) = 0.5, t = 2, method = euler, numsteps = 10, 
output = information, digits = 8) 


Maple produces 


1..12x 1..4 Array 
Data Type: anything 
Storage: rectangular 
Order: Fortran_order 


Double clicking on the output brings up a table that gives the values of ¢;, actual solution 
values y(t;), the Euler approximations w;, and the absolute errors | y(¢;) — w;|. These agree 
with the values in Table 5.1. 

To print the Maple table we can issue the commands 


for k from | to 12 do 
print(Clk, 1], C[k, 2], CIk, 3], CLK, 41) 
end do 


The options within the Initial ValueProblem command are the specification of the first order 
differential equation to be solved, the initial condition, the final value of the independent 
variable, the choice of method, the number of steps used to determine that h = (2 — 0)/ 
(numsteps), the specification of form of the output, and the number of digits of rounding 
to be used in the computations. Other output options can specify a particular value of t or 
a plot of the solution. 


Error Bounds for Euler’s Method 


Although Euler’s method is not accurate enough to warrant its use in practice, itis sufficiently 
elementary to analyze the error that is produced from its application. The error analysis for 
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Lemma 5.7 


Lemma 5.8 


Initial-Value Problems for Ordinary Differential Equations 


the more accurate methods that we consider in subsequent sections follows the same pattern 
but is more complicated. 
To derive an error bound for Euler’s method, we need two computational lemmas. 
For all x > —1 and any positive m, we have 0 < (1+x)” <e™. | 
Proof Applying Taylor’s Theorem with f(x) = e*, x9 = 0, andn = | gives 
1 
e=1+4+x+—x’é, 
2 
where & is between x and zero. Thus 
1 2 € x 
Slate sleet are =e, 
and, because 1 + x > 0, we have 


0<d4+x)"<(e)"=e™. = 8 & 


If s and ¢ are positive real numbers, {ai} og is a sequence satisfying ag > —t/s, and 


G41 < A+s)a;+t, foreachi=0,1,2,...,k —1, (5.9) 
then 
t t 
isi S et Ds ( + ‘) =, o 
Ss 


Proof Fora fixed integer i, Inequality (5.9) implies that 
diz, < I +s)a;+t 
<(+s[d4+sa-1++t1=(4+sPa1+0+U+ 9] 
<(1+sPa2+[1+0+59+0+4+s)]¢ 


< (+s) + [1+ 045) + 4s? +--+ 4+3)']t. 
But 


1+ +s) + (ts? t+ + (+5) = od 4+sy 
j=0 


is a geometric series with ratio (1 + s) that sums to 
1-(4+s)*! 1 


it] 
T-d+5 = eas) 1]. 


Thus 
; 1 i+] _ 1 ; t t 
aiz1 S (1 +5)"Fag + ee Se (1+ s)'t! ( ae =| ~ Ss 
Ss Ss Ss 
and using Lemma 5.7 with x = 1 + s gives 
t 


. t 
Qi+1 < eft Ds ( + ‘) aaa = 8 @ 
Ss Ss 
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Theorem 5.9 Suppose f is continuous and satisfies a Lipschitz condition with constant L on 
D={(t,y)|a<t<band —o <y < om} 
and that a constant M exists with 
ly’@O| <M, forall t € [a,b], 
where y(t) denotes the unique solution to the initial-value problem 
y=f@y), ast<b, ya)=a. 


Let wo, W1,..., Wy be the approximations generated by Euler’s method for some positive 
integer N. Then, for each i = 0,1,2,...,N, 


7 
ly@) — wil < ae »—1). (5.10) 


Proof When i = 0 the result is clearly true, since y(t9) = wo = a. 
From Eq. (5.7), we have 


h 
Vie) = yG) +hfi.yG)) + zy Gi, 


for i = 0,1,...,N — 1, and from the equations in (5.8), 
Win1 = Wi thf (Gt, wi). 


Using the notation y; = y(¢;) and y;41 = y(t41), we subtract these two equations to obtain 
h2 
Vint — Wi =V — Wi thlfGy) — fG, wi] + zy Gi) 
Hence 


h ; 
lyinn — Wigt| <li — wl +h fi.) — fG,wa)| + lal. 


Now f satisfies a Lipschitz condition in the second variable with constant L, and 
ly’@| <M, so 

h’M 

I visa — Wigil < A +AL)|y; — wil + a 


Referring to Lemma 5.8 and letting s = AL, t = h*M/2, and a; = | y; — wj|, for each 
j=0,1,...,N, we see that 


) 12M 


G+DhAL 
a = <e Ww ‘ 
| Vita iwi] S ¢ ol + hE. ThE, 


Because | yo — wo| = O and (i + 1h = #41 — t = ti41 — a, this implies that 
WM a 
lYie1 — Wisi] < ape al _ 1), 


foreachi=0,1,...,N—1. _ = 8 
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The weakness of Theorem 5.9 lies in the requirement that a bound be known for the 
second derivative of the solution. Although this condition often prohibits us from obtaining 
a realistic error bound, it should be noted that if 0f/dt and 0f/dy both exist, the chain rule 
for partial differentiation implies that 


ee ee, af 6 np. 
yO= eP (t\) = i (t,y@)) = at (t, y(t)) + ay (ty): f(t,y@)). 


So it is at times possible to obtain an error bound for y(t) without explicitly knowing y(t). 


Example 2 The solution to the initial-value problem 
y=y-Pt+l O<t<2, y0)=05, 


was approximated in Example | using Euler’s method with = 0.2. Use the inequality in 
Theorem 5.9 to find a bounds for the approximation errors and compare these to the actual 
errors. 


Solution Because f(t,y) = y — t? + 1, we have Of (t, y)/dy = 1 for all y, so L = 1. For 
this problem, the exact solution is y(t) = (t + 1)* — 0.5e', so y(t) = 2 — 0.5e! and 


ly’()| <0.5e2—2, forall t € [0,2]. 


Using the inequality in the error bound for Euler’s method with h = 0.2, L = 1, and 
M = 0.5e? — 2 gives 


| y; — w;| < 0.1(0.5e? — 2)(e% — 1). 
Hence 
| y(0.2) — w;| <0.1(0.5e? — 2)(e°” — 1) = 0.03752; 
| y(0.4) — w2| <0.1(0.5e? — 2)(e°* — 1) = 0.08334; 


and so on. Table 5.2 lists the actual error found in Example 1, together with this error 
bound. Note that even though the true bound for the second derivative of the solution was 
used, the error bound is considerably larger than the actual error, especially for increasing 
values of ft. | 


Table 5.2 
t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 


Actual Error 0.02930 ~=—0.06209 =: 0.09854 ~—s-(0.13875 ~—-0.18268 }~=—-0.23013' 0.28063 —s«0.33336 ~=—-0.38702 ~=—: 0.43969 
Error Bound ~=0.03752 ~—-0.08334 —s (0.13931 0.20767) — «0.29117 Ss: 0.39315 0.51771 ~—:0.66985 =—-0.85568 ~—-1.08264 


The principal importance of the error-bound formula given in Theorem 5.9 is that the 
bound depends linearly on the step size h. Consequently, diminishing the step size should 
give correspondingly greater accuracy to the approximations. 

Neglected in the result of Theorem 5.9 is the effect that round-off error plays in the 
choice of step size. As h becomes smaller, more calculations are necessary and more round- 
off error is expected. In actuality then, the difference-equation form 


wo = a, 


Win = Withf(t,wi), foreachi=0,1,...,.N—1, 
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is not used to calculate the approximation to the solution y; at a mesh point t;. We use instead 
an equation of the form 


Up = a+ do, 


U4) =U thf (t,uj) +641, foreachi=0,1,...,N—1, (5.11) 


where 6; denotes the round-off error associated with u;. Using methods similar to those in 
the proof of Theorem 5.9, we can produce an error bound for the finite-digit approximations 
to y; given by Euler’s method. 


Let y(t) denote the unique solution to the initial-value problem 


¥=fhy), @s126, yore (5.12) 


and uo,u1,...,uy be the approximations obtained using (5.11). If |6;| < 6 for each 
i=0,1,...,N and the hypotheses of Theorem 5.9 hold for (5.12), then 
1/hM 56 Lita) Li 
ti) —ui| < — | —+-— aoe a | beh , l 
| y(ti) wlst (S43) ] + [dole (5.13) 
for eachi = 0,1,...,N. |] 


The error bound (5.13) is no longer linear in h. In fact, since 


the error would be expected to become large for sufficiently small values of h. Calculus can 
be used to determine a lower bound for the step size h. Letting E(h) = (hM/2) + (6/h) 
implies that E’(h) = (M/2) — (6/h?). 


If h<./25/M, then E'(h) < 0 and E(h) is decreasing. 
If h> /25/M, then E'(h) > 0 and E(h) is increasing. 
The minimal value of E(h) occurs when 


26 


h=,/—. 5.14 

u (5.14) 
Decreasing h beyond this value tends to increase the total error in the approximation. 
Normally, however, the value of 6 is sufficiently small that this lower bound for h does not 


affect the operation of Euler’s method. 


EXERCISE SET 5.2 


1. 


Use Euler’s method to approximate the solutions for each of the following initial-value problems. 
a y =te*—2y, O<t<1, y(O0)=0, withh=0.5 

b y=14+(t-y), 2<1<3, y2)=1, withh=0.5 

«e y=1+y/t, 1<t<2, yl) =2, withh = 0.25 
d 


y =cos2t+sin3t, O<t<1, yO)=1, withh=0.25 
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2. Use Euler’s method to approximate the solutions for each of the following initial-value problems. 


a y=e, O0<t<1, yO) =1, withhh=05 
y =—, 1<t<2, yl) =2, withh=05 
« y=-ytpy!’?, 2<1t<3, y(2)=2, withh = 0.25 


d. y'=r(sin2t—2ty), 1<t<2, y(1) =2, withh =0.25 


3. The actual solutions to the initial-value problems in Exercise 1 are given here. Compare the actual 
error at each step to the error bound. 


1 3t I 3t 1 —2t 1 
th = -te*’-—- =e" 4+ — b. t) =t+— 
a. y(t) gif — axe + 55e y(t) tT 
1 1 4 
ce. y(t) =tlnt+2t d. y@) = 5 sin2t— 5 cos3t+ 5 


4. The actual solutions to the initial-value problems in Exercise 2 are given here. Compute the actual 
error and compare this to the error bound if Theorem 5.9 can be applied. 


a. y(t) =In(e' +e —1) b yOuVe+u4+6-1 
= 2 4+ cos2 — cos 2 
« y= (: ate Vee") d. y() = eee 


5. Use Euler’s method to approximate the solutions for each of the following initial-value problems. 
a y=y/t—Q/t’?, 1<t<2, y)=1, withhh=0.1 
bh y=14+y/t+Q/t", 1<t<3, yd) =0, withh=02 
«e y=-(yt+)o04+3), O<t<2, yO) =—2, withh=0.2 
d. y =—Sy+5P°+4+24, O<t<1, yO)=4, withh=0.1 


6. Use Euler’s method to approximate the solutions for each of the following initial-value problems. 


a Pace Hepes (0) = 1, withh = 0.1 
sec 2 pag pees r) p 1 _ 
y Za y 
y? 
b y= , l<t<2, 1) = —(n2)7!, withh =0.1 
yaaa <t< y() (In 2) 1 


ce y=(?+4+y)/t, 1<t<3, yl) = 2, withhh=0.2 
d. y =-tyt+4n!, O0<1¢<1, yO)=1, withh=01 


7. The actual solutions to the initial-value problems in Exercise 5 are given here. Compute the actual 
error in the approximations of Exercise 5. 


b. y(t) = ttan(In?) 


2 2,175 
ce y(t) = —3 + ———— d. y(t)=t age 


8. The actual solutions to the initial-value problems in Exercise 6 are given here. Compute the actual 
error in the approximations of Exercise 6. 


‘ fae b w= -l1 

eae Fe eed 
2t : 

« y(t)= iw d. y(t) =V4-3e° 


9. Given the initial-value problem 
y 2 2. ot 
y ee es 1<t<2, y)=0, 
with exact solution y(t) = t?(e' — e): 


a. Use Euler’s method with = 0.1 to approximate the solution, and compare it with the actual 
values of y. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 
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b. Use the answers generated in part (a) and linear interpolation to approximate the following values 
of y, and compare them to the actual values. 
i. (1.04) i, =—y(1.55) iii, =—y(1.97) 

c. Compute the value of h necessary for | y(t;) — w;| < 0.1, using Eq. (5.10). 

Given the initial-value problem 


with exact solution y(t) = —1/t: 


a. Use Euler’s method with h = 0.05 to approximate the solution, and compare it with the actual 
values of y. 

b. Use the answers generated in part (a) and linear interpolation to approximate the following values 
of y, and compare them to the actual values. 
i. y(1.052) ii, y(1.555) iii, (1.978) 

c. Compute the value of / necessary for | y(t;) — w;| < 0.05 using Eq. (5.10). 

Given the initial-value problem 


y=-ytrt+l, 0<1<5, yO)=1, 


with exact solution y(t) = e' +t: 

a. Approximate y(5) using Euler’s method with h = 0.2,h = 0.1, and h = 0.05. 

b. Determine the optimal value of / to use in computing y(5), assuming 6 = 10~° and that Eq. (5.14) 
is valid. 


Consider the initial-value problem 
y=-ldy, O<t<2, yO)=1, 


which has solution y(t) = e~!®. What happens when Euler’s method is applied to this problem with 
h = 0.1? Does this behavior violate Theorem 5.9? 

Use the results of Exercise 5 and linear interpolation to approximate the following values of y(f). 
Compare the approximations obtained to the actual values obtained using the functions given in 
Exercise 7. 

a. y(1.25) and y(1.93) b.  y(2.1) and y(2.75) 

ce. y(1.3) and y(1.93) d. _y(0.54) and y(0.94) 

Use the results of Exercise 6 and linear interpolation to approximate the following values of y(f). 
Compare the approximations obtained to the actual values obtained using the functions given in 
Exercise 8. 


a. (0.25) and y(0.93) b.  y(1.25) and y(1.93) 
ce. y(2.10) and y(2.75) d. (0.54) and y(0.94) 
Let E(h) = eae 

oh 


a. For the initial-value problem 
y=-yt+l, O<t<1, y)=0, 


compute the value of / to minimize E(h). Assume 6 = 5 x 107+) if you will be using n-digit 
arithmetic in part (c). 
b. For the optimal / computed in part (a), use Eq. (5.13) to compute the minimal error obtainable. 
c. Compare the actual error obtained using h = 0.1 and h = 0.01 to the minimal error in part (b). 
Can you explain the results? 
In a circuit with impressed voltage € having resistance R, inductance L, and capacitance C in parallel, 
the current i satisfies the differential equation 


di PE 1d€E 1 


=C 
dt @ Rae 1 
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Suppose C = 0.3 farads, R = 1.4 ohms, L = 1.7 henries, and the voltage is given by 
E(t) = @ 9 sin(2t — 1). 


If i(0) = 0, find the current i for the values t = 0.17, where j = 0, 1,..., 100. 

17. Inabook entitled Looking at History Through Mathematics, Rashevsky [Ra], pp. 103-110, considers 
a model for a problem involving the production of nonconformists in society. Suppose that a society 
has a population of x(t) individuals at time f, in years, and that all nonconformists who mate with 
other nonconformists have offspring who are also nonconformists, while a fixed proportion r of all 
other offspring are also nonconformist. If the birth and death rates for all individuals are assumed to 
be the constants b and d, respectively, and if conformists and nonconformists mate at random, the 
problem can be expressed by the differential equations 


OO ay ae 
dt 


= (b— d)x, 0) + rb) — x), 


where x,,(t) denotes the number of nonconformists in the population at time f. 

a. Suppose the variable p(t) = x,(t)/x(t) is introduced to represent the proportion of noncon- 
formists in the society at time t. Show that these equations can be combined and simplified to 
the single differential equation 

dp(t) 


a rb(1 — p(t). 


b. Assuming that p(0) = 0.01, b = 0.02, d = 0.015, and r = 0.1, approximate the solution p(t) 
from t = 0 to t = 50 when the step size is h = 1 year. 


c. Solve the differential equation for p(t) exactly, and compare your result in part (b) when t = 50 
with the exact value at that time. 


| 5.3 Higher-Order Taylor Methods 


Since the object of a numerical techniques is to determine accurate approximations with 
minimal effort, we need a means for comparing the efficiency of various approximation 
methods. The first device we consider is called the local truncation error of the method. 

The local truncation error at a specified step measures the amount by which the exact 
solution to the differential equation fails to satisfy the difference equation being used for 
the approximation at that step. This might seem like an unlikely way to compare the error 
of various methods. We really want to know how well the approximations generated by the 
methods satisfy the differential equation, not the other way around. However, we don’t know 
the exact solution so we cannot generally determine this, and the local truncation will serve 
quite well to determine not only the local error of a method but the actual approximation 
error. 

Consider the initial value problem 


y=f(ty), a<t<b, ya=a. 


Definition 5.11 The difference method 


Wo =a 
Wi+l = w; + hd(ti, wi), foreachi=0,1,...,N—1, 
has local truncation error 


yitr — Oi +hOG.yI)) — Viti — Yi 


41 (A) = 
Ti41(h) , i 


(ti, Vi), 
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The methods in this section use 
Taylor polynomials and the 
knowledge of the derivative at a 
node to approximate the value of 
the function at a new node. 
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for each i = 0,1,...,N — 1, where y; and y,;,; denote the solution at 7; and ¢+41, 
respectively. a 


For example, Euler’s method has local truncation error at the ith step 


rit) =" — F Gy), for eachi = 0,1,...,N — 1. 

This error is a local error because it measures the accuracy of the method at a specific 
step, assuming that the method was exact at the previous step. As such, it depends on the 
differential equation, the step size, and the particular step in the approximation. 

By considering Eq. (5.7) in the previous section, we see that Euler’s method has 


h 
Tini(h) = Sy"), for some &} in (ty, fe). 


When y”(t) is known to be bounded by a constant M on [a, b], this implies 
h 
Iti41(A)| < aM. 


so the local truncation error in Euler’s method is O(h). 

One way to select difference-equation methods for solving ordinary differential equa- 
tions is in such a manner that their local truncation errors are O(h?) for as large a value 
of p as possible, while keeping the number and complexity of calculations of the methods 
within a reasonable bound. 

Since Euler’s method was derived by using Taylor’s Theorem withn = | to approximate 
the solution of the differential equation, our first attempt to find methods for improving the 
convergence properties of difference methods is to extend this technique of derivation to 
larger values of n. 

Suppose the solution y(t) to the initial-value problem 


¥J=H f{Gy, a=teb, ye =a, 


has (n+ 1) continuous derivatives. If we expand the solution, y(f), in terms of its nth Taylor 
polynomial about ¢; and evaluate at t;,;, we obtain 


h2 h peri 
ytin1) = y(t) + hy’) + Sy") + + Sy) + — yO &), 6.15) 
2 n! (n+ 1)! 
for some &; in (¢;, t;41). 
Successive differentiation of the solution, y(t), gives 


y= f(tyO), y'O= fy), — and, generally, y(@) = f(t y@). 
Substituting these results into Eq. (5.15) gives 


h2 
yti+1) = yi) +AS i, yi) + 5 FG. ¥G) ap aes (5.16) 


n+l 


+= £0, 90) + FG.) 
nl vee’ (n+ 0! —— 


The difference-equation method corresponding to Eq. (5.16) is obtained by deleting 
the remainder term involving &;. 
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Example 1 

Table 5.3 

Taylor 

Order 2 Error 

tj Wi ly(t;) — wi 

0.0 0.500000 0 
0.2. 0.830000 0.000701 
0.4 1.215800 0.001712 
0.6 1.652076 0.003135 
0.8 2.132333 0.005103 
1.0 2.648646 0.007787 
1.2 3.191348 0.011407 
1.4 3.748645 0.016245 
1.6 4.306146 0.022663 
1.8 4.846299 0.031122 
2.0 5.347684 0.042212 


Initial-Value Problems for Ordinary Differential Equations 


Taylor method of order n 


Wo = a, 


Wi = w;, AT (t;, wD, 


where 


foreach i=0,1,... 


N-1, (5.17) 


(n) h / ee (n—1) 
TT (Gj, Wi) = fi, wi) + af (tj, Wi) +--+ + al (t;, Wj). 


Euler’s method is Taylor’s method of order one. 


Apply Taylor’s method of orders (a) two and (b) four with N = 10 to the initial-value 


problem 


y=y-Ptl, 


0<t<2, 


y(0) = 0.5. 


Solution (a) For the method of order two we need the first derivative of f(t, y(t) = 
y(t) — t? + 1 with respect to the variable t. Because y’ = y — t* + 1 we have 


d 
PUNO a ay 24=y—-P+1-21, 


sO 


(2) hy 2 h 2 
T'(t;, wi) = fi, wi) + af (t;, Wj) = wi—-t +14+ wi — + 1 — 24) 


h 2 
= ue (w; — t + 1) — ht; 


Because N = 10 we have h = 0.2, and t; = 0.2i for each i 
second-order method becomes 


Wo = 0.5, 


Wi41 = wital(14 >) (wi —t7 + 1) = t 


0.2 ; . 
= wj +0.2] (1+ =) (w; — 0.047 + 1) - 0.04% 


= 1.22w; — 0.00887 — 0.008: + 0.22. 


The first two steps give the approximations 


1,2,...,10. Thus the 


y(0.2) © w; = 1.22(0.5) — 0.0088(0)* — 0.008(0) + 0.22 = 0.83; 


y(0.4) © wy = 1.22(0.83) — 0.0088(0.2)2 — 0.008(0.2) + 0.22 = 1.2158 


All the approximations and their errors are shown in Table 5.3 


(b) For Taylor’s method of order four we need the first three derivatives of f(t, y(t)) 


with respect to fr. Again using y’ = y — 1? + 1 we have 


f(y) =y-? +121, 


PbyO) = Lo - 2 41-2) ay - 28-2 


Pal 


2t 
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and 


d 
f"tyO) = FO-F -2-Yay Spor = 1 
NO) 


(4) h ! he " he m 
T (Gj, Wi) = fi, wi) + af (t;, Wi) + ral (tj, wi) + mt (t;, wi) 


2 h 2 he 2 
= w;j- Fj +1 + 5 (wi t +1—-—2t)+ G (wi t 2t; — 1) 


I 


he 5 
;—-t; —2t,;-1 
ge ) 
Si eer (ee ae 
a 6 yo pe aa 
Ate h wR PRB 
2 6 24° 
Hence Taylor’s method of order four is 
Wo = 0.5, 
=with fee a (w; — #?) ees ht 
rx! i ie) 3° 12)" 
+1+ se a al 
2 6 24 |’ 
fori =0,1,...,N—1. 
Because N = 10 and h = 0.2 the method becomes 
S508) (1-2 0.2 " 0.04 a2 0.008 ( 0.0422) 
Wit) = Vj : 5 6 7A Wj 04i 


0.2 0.04 0.2 0.04 0.008 
1 0.04i) + 1 
( ak ee eae | 


= 1.2214w; — 0.008856i* — 0.00856i + 0.2186, 


Table 5.4 for each i = 0,1,...,9. The first two steps give the approximations 

a y(0.2) © w, = 1.2214(0.5) — 0.008856(0)? — 0.00856(0) + 0.2186 = 0.8293; 

Order 4 Error 
ti Wi ly@i)— wil -y(0.4) & wz = 1.2214(0.8293) — 0.008856(0.2)? — 0.00856(0.2) + 0.2186 = 1.214091 
0.0 0.500000 0 ; : : ; 
0.2 0.829300 0.000001 All the approximations and their errors are shown in Table 5.4. 
0.4 1.214091 0.000003 Compare these results with those of Taylor’s method of order 2 in Table 5.4 and you 
0.6 1.648947 0.000006 Will see that the fourth-order results are vastly superior. 
0.8 2.127240 0.000010 The results from Table 5.4 indicate the Taylor’s method of order 4 results are quite 
1.0 2.640874 0.000015 —_ accurate at the nodes 0.2, 0.4, etc. But suppose we need to determine an approximation to 
1.2 3.179964 0.000023 _—_an intermediate point in the table, for example, at t = 1.25. If we use linear interpolation 
1.4 3.732432 0.000032 on the Taylor method of order four approximations at t = 1.2 and t = 1.4, we have 
1.6 4.283529 0.000045 
1.8 4.815238 0.000062 1.25 — 1.4 1.25 — 1.2 
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Hermite interpolation requires 
both the value of the function and 
its derivative at each node. This 
makes it a natural interpolation 
method for approximating 
differential equations since these 
data are all available. 


Table 5.5 


Theorem 5.12 


Initial-Value Problems for Ordinary Differential Equations 


The true value is y(1.25) = 3.3173285, so this approximation has an error of 0.0007525, 
which is nearly 30 times the average of the approximation errors at 1.2 and 1.4. 

We can significantly improve the approximation by using cubic Hermite interpolation. 
To determine this approximation for y(1.25) requires approximations to y’ (1.2) and y’(1.4) 
as well as approximations to y(1.2) and y(1.4). However, the approximations for y(1.2) and 
y(1.4) are in the table, and the derivative approximations are available from the differential 
equation, because y’/(t) = f(t, y(t)). In our example y’(t) = y(t) — #? + 1, so 


y (1.2) = y(1.2) — teas +1 3.1799640 — 1.44+ 1 = 2.7399640 
and 
y' (1.4) = y(.4) — (1.4)? +1 3.7324327 — 1.96 + 1 = 2.7724321. 


The divided-difference procedure in Section 3.4 gives the information in Table 5.5. 
The underlined entries come from the data, and the other entries use the divided-difference 
formulas. 


i.2 3.1799640 


2.7399640 
1.2 3.1799640 0.1118825 

2.7623405 —0.3071225 
1.4 3.7324321 0.0504580 

2.7724321 


1.4 3.7324321 


The cubic Hermite polynomial is 
y(t) © 3.1799640 + (t — 1.2)2.7399640 + (t — 1.2)70.1118825 
+ (t — 1.2)?(t — 1.4)(—0.3071225), 
so 
y(1.25) © 3.1799640 + 0.1369982 + 0.0002797 + 0.0001152 = 3.317357], 


a result that is accurate to within 0.0000286. This is about the average of the errors at 1.2 
and at 1.4, and only 4% of the error obtained using linear interpolation. This improvement 
in accuracy certainly justifies the added computation required for the Hermite method. m 


If Taylor’s method of order n is used to approximate the solution to 
yYO=ufy@), astsb, ya=a, 


with step size h and if y € C”*'[a, b], then the local truncation error is O(h"). | 


Proof Note that Eq. (5.16) on page 277 can be rewritten 
n n+1 


Ya1—-y—h : 4 ie i ooejee (Ny yy.) — h () («. ; 
it+1 — Yi Sf (tis Yi) 5 Sf Gis yi) ae (ti, i) (n+ pit (i, y(i)), 


for some &; in (¢;, t:+1). So the local truncation error is 
h” 
(n+ 1)! 


foreachi = 0,1,...,N—1.Sincey € C"t![a, b], we have y"*) (t) = f(t, y(t)) bounded 
on [a, b] and t;(h) = O(h"), for eachi = 1,2,...,N. =. 6 


—T™ (ti, y) = f &,.¥(&)), 


Yi+1 — Yi 
T.1(h) = oo 
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Taylor’s methods are options within the Maple command InitialValueProblem. The 
form and output for Taylor’s methods are the same as available under Euler’s method, as 
discussed in Section 5.1. To obtain Taylor’s method of order 2 for the problem in Example 1, 
first load the package and the differential equation. 


with(Student|NumericalAnalysis]) : deq := diff(y(t), t) = y(t) — rP+i1 
Then issue 


C := Initial ValueProblem(deq, y(O) = 0.5, t = 2, method = taylor, order = 2, 
numsteps = 10, output = information, digits = 8) 


Maple responds with an array of data similar to that produced with Euler’s method. Double 
clicking on the output will bring up a table that gives the values of t;, actual solution values 
y(t;), the Taylor approximations w;, and the absolute errors | y(t;) — w;|. These agree with 
the values in Table 5.3. 

To print the table issue the commands 


for k from | to 12 do 
print(Clk, 1], C[k, 2], Clk, 3], CLK, 41) 
end do 


EXERCISE SET 53 


1. 


> 


Use Taylor’s method of order two to approximate the solutions for each of the following initial-value 
problems. 

a y=te’—2y, OK<r<l, 
bh y=14+(-Yy)’, 


y(0) = 0, with h = 0.5 

2<t<3, y(2)=1, withh=05 

ce y=lt+y/t, 1<t<2, yd) =2, withh=0.25 

d. y =cos2t+sin3t, O<t<1, yO)=1, withh=0.25 

Use Taylor’s method of order two to approximate the solutions for each of the following initial-value 
problems. 


“a yer, C2221, 40) =i WHhREOS 
Vek 

bi. fee. 12920 GSE SOS 
l+y 


« yo=-yty!?, 2<t<3, y2)=2, withh = 0.25 

d. y=r(sin2t—2ty), 1<t<2, yd) =2, withh = 0.25 

Repeat Exercise 1 using Taylor’s method of order four. 

Repeat Exercise 2 using Taylor’s method of order four. 

Use Taylor’s method of order two to approximate the solution for each of the following initial-value 
problems. 

ay =y/t—(/t)’, 
b y'=sint+e’, O<t<1, 
« y=G6°+y)/t, 1<t<3, yd) =-2, withhh=0.5 

d. y=-ty+4y', O<t<1, y)=1, withh =0.25 

Use Taylor’s method of order two to approximate the solution for each of the following initial-value 
problems. 


l<r<12, yQ)=1, withh=01 
y(0) = 0, with h = 0.5 


a oy 2a) gepei (0) = 1, withh =0.1 
_ > St™= 1; = 1, Wil =U. 
as ae y 
2. 
ae: Pare 
b. = , l<t<2, yl) =—(n2)-!, withh = 0.1 
ae rar <t< y(1) (In2)~", wi 
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c« y=O°+y)/t, 1<t<3, yl) = -2, withhh=0.2 
da y=-ty+4t/y, O<t<1, yO) =1, withh=0.1 
Repeat Exercise 5 using Taylor’s method of order four. 


en 


Repeat Exercise 6 using Taylor’s method of order four. 
9. Given the initial-value problem 


with exact solution y(t) = t?(e' — e): 
a. Use Taylor’s method of order two with h = 0.1 to approximate the solution, and compare it with 
the actual values of y. 
b. Use the answers generated in part (a) and linear interpolation to approximate y at the following 
values, and compare them to the actual values of y. 
i. y(1.04) fi, (1.55) iii, (1.97) 
c. Use Taylor’s method of order four with h = 0.1 to approximate the solution, and compare it 
with the actual values of y. 
d. Use the answers generated in part (c) and piecewise cubic Hermite interpolation to approximate 
y at the following values, and compare them to the actual values of y. 
i. = y(1.04) ii, —y(1.55) iii, (1.97) 
10. Given the initial-value problem 
_— =e 1<t<2, y)=-l, 
with exact solution y(t) = —1/t: 
a. Use Taylor’s method of order two with h = 0.05 to approximate the solution, and compare it 
with the actual values of y. 
b. Use the answers generated in part (a) and linear interpolation to approximate the following values 
of y, and compare them to the actual values. 
i. = y(1.052) li, = y(1.555) iii, y(1.978) 
c. Use Taylor’s method of order four with h = 0.05 to approximate the solution, and compare it 
with the actual values of y. 


d. Use the answers generated in part (c) and piecewise cubic Hermite interpolation to approximate 
the following values of y, and compare them to the actual values. 


i. = y(1.052) ii, =-y(1.555) iii, (1.978) 
11. A projectile of mass m = 0.11 kg shot vertically upward with initial velocity v(0) = 8 m/s is slowed 
due to the force of gravity, F, = —mg, and due to air resistance, F, = —kv|v|, where g = 9.8 m/s? 


and k = 0.002 kg/m. The differential equation for the velocity v is given by 
mv’ = —mg — kv|v|. 


a. Find the velocity after 0.1,0.2,...,1.0s. 
b. To the nearest tenth of a second, determine when the projectile reaches its maximum height and 
begins falling. 
12. Use the Taylor method of order two with h = 0.1 to approximate the solution to 


y =1+tsingy), O<t<2, yO)=0. 


| Se 5.4 Runge-Kutta Methods 


The Taylor methods outlined in the previous section have the desirable property of high- 
order local truncation error, but the disadvantage of requiring the computation and evaluation 
of the derivatives of f(t, y). This is a complicated and time-consuming procedure for most 
problems, so the Taylor methods are seldom used in practice. 
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In the later 1800s, Carl Runge 
(1856-1927) used methods 
similar to those in this section to 
derive numerous formulas for 
approximating the solution to 
initial-value problems. 


Theorem 5.13 


In 1901, Martin Wilhelm Kutta 
(1867-1944) generalized the 
methods that Runge developed in 
1895 to incorporate systems of 
first-order differential equations. 
These techniques differ slightly 
from those we currently call 
Runge-Kutta methods. 


Example 1 
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Runge-Kutta methods have the high-order local truncation error of the Taylor methods 
but eliminate the need to compute and evaluate the derivatives of f(t, y). Before presenting 
the ideas behind their derivation, we need to consider Taylor’s Theorem in two variables. 
The proof of this result can be found in any standard book on advanced calculus (see, for 
example, [Fu], p. 331). 


Suppose that f(t, y) and all its partial derivatives of order less than or equal to n + | are 
continuous on D = {(t,y) | a < t < b,c < y < d}, and let (t, yo) € D. For every 
(t, y) € D, there exists € between ¢ and fy and jz between y and yo with 


f(y) = Pity) + Ray), 


where 
0 0 
Pilt,y) = f (to, Yo) + l« - 10) (0,30) +Q- mE Co.» 
- 2 Q2 a? 
+ |G orto +e to)(y v0) 5a 30) 
(y — yo)” 0° f 
ae ae wr to) fee 
1 n n ad ; a” f 
- 7 2 @ — to)" “—y — 90) Frasigys lO» ¥0) 
and 
1 n+l pe 1 noe ; ant! Ff 
R,(t.y) = GED! > ( j Jon pyr "0 — Yo)" Sintimigys > H)- 


j=0 
The function P,,(t, y) is called the nth Taylor polynomial in two variables for the 


function f about (to, yo), and R,(f, y) is the remainder term associated with P,,(t, y). a 


Use Maple to determine P2(f, y), the second Taylor polynomial about (2, 3) for the function 


(@-2P -3y 
4 4 


f(t,y) = exp |- cos(2t + y — 7) 


Solution To determine P2(t,y) we need the values of f and its first and second partial 
derivatives at (2,3). The evaluation of the function is easy 


-07/4-07/4) 


f2,3) <el cos(4+3—7)=1, 


but the computations involved with the partial derivatives are quite tedious. However, higher 
dimensional Taylor polynomials are available in the MultivariateCalculus subpackage of 
the Student package, which is accessed with the command 


with(Student[|MultivariateCalculus]) 


The first option of the TaylorApproximation command is the function, the second specifies 
the point (f, yo) where the polynomial is centered, and the third specifies the degree of the 
polynomial. So we issue the command 
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1-2)? _ (y=3)? 
TaylorApproximation (< a Oa cos(2t + y — 7), [t, y] = [2, 3], 2) 


The response from this Maple command is the polynomial 
9 3 
j= gt — 2)? = 2 = yy — 3) - oe 


A plot option is also available by adding a fourth option to the TaylorApproximation 
command in the form output = plot. The plot in the default form is quite crude, however, 
because not many points are plotted for the function and the polynomial. A better illustration 
is seen in Figure 5.5. 


Figure 5.5 


Pi(t,y) = 1 


WZ 


Lp ff ey. 
\C2YYizizad 
SCE 
SSS 
Ss 


At, y) = exp {—(t — 2)7/4 3)7/4} cos (2t + y — 7) 


The final parameter in this command indicates that we want the second multivariate 
Taylor polynomial, that is, the quadratic polynomial. If this parameter is 2, we get the 
quadratic polynomial, and if it is 0 or 1, we get the constant polynomial 1, because there are 
no linear terms. When this parameter is omitted, it defaults to 6 and gives the sixth Taylor 
polynomial. a 


Runge-Kutta Methods of Order Two 


The first step in deriving a Runge-Kutta method is to determine values for a;,a1, and B; 
with the property that a; f(t + a1, y + 61) approximates 


Quy) — Ae, 
T EY=fEW+5F (t,y), 


with error no greater than O(h’), which is same as the order of the local truncation error for 
the Taylor method of order two. Since 
df af af 


zy (ty) = Fr (t,y) + ay OVO and y(t)= f(t,y), 


fH 
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we have 


ho hoa 
T(t.) = fly) + sty) + so thy) FU, (5.18) 


Expanding f (t+ a@,,y + 6,) in its Taylor polynomial of degree one about (t, y) gives 
of 
aft+a,yt+ Bi) =aflt.y) + aya ~ -(t,Y) 
af 
rel Za -Ri(t+a1,y+ Bi), (5.19) 
where 


at of us Bi of 


Ri(t+a,y+ Bi) = > OP 6, Hr CaP. ye (a lene 2 dy aur (E>), (5.20) 


for some € between tf and ¢t + a and yz between y and y + fj. 
Matching the coefficients of f and its derivatives in Eqs. (5.18) and (5.19) gives the 
three equations 


0 h 0 h 
f@y): a =1; (9) : ae = 5; and sett) > api = 5 Fy). 


The parameters a), a, and §, are therefore 
1 Z d £p Ft ) 
aj=1, = an =) > > 
1 a) ,) 1 5) y 
so 
Q) h h h h 
T (t,y) =f t+-,y+-f(t,y) —R t+-~,y+-f(ty) > 
2 2 2, 2 
and from Eq. (5.20), 
ha af af 


h h 
a (43.04 3ren) = ore +e mi DEW) 


+Xue es zi (ey). 


If all the second-order partial derivatives of f are bounded, then 
Ri (t+ f + : f(t, y) 
a 2 re 2 é oy 
is O(n’). Asa consequence: 


e The order of error for this new method is the same as that of the Taylor method of order 
two. 


The difference-equation method resulting from replacing T(t, y) in Taylor’s method 
of order two by f (t+ (h/2),y + (h/2) f(t, y)) is a specific Runge-Kutta method known as 
the Midpoint method. 
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Example 2 


Initial-Value Problems for Ordinary Differential Equations 


Midpoint Method 


wo = a, 


h h 
Witl = wi thf («+ it 5 ftt,w) , fori=0,1,...,N—1. 


Only three parameters are present in a; f(t + a1, y + 61) and all are needed in the 
match of 7, So a more complicated form is required to satisfy the conditions for any of 
the higher-order Taylor methods. 

The most appropriate four-parameter form for approximating 


(3) h / I " 
T Gy=fEYM+ sf On+ of (t, y) 


af(t.y)+af(t+a,y + df (ty); (5.21) 
and even with this, there is insufficient flexibility to match the term 


Ta 2 
- en] ft.y), 


resulting from the expansion of (h?/6) f(t, y). Consequently, the best that can be obtained 
from using (5.21) are methods with O(h’) local truncation error. 

The fact that (5.21) has four parameters, however, gives a flexibility in their choice, 
so a number of O(h?) methods can be derived. One of the most important is the Modified 
Euler method, which corresponds to choosing a; = a2 = 5 and a2 = 62 = A. It has the 
following difference-equation form. 


Modified Euler Method 


Wo = a, 


h 
Wit. = Wit af Gi, wi) + f(tisi,wi thf (t,w;i)))], for i=0,1,...,N—1. 


Use the Midpoint method and the Modified Euler method with N = 10, h = 0.2, t; = 0.21, 
and wo = 0.5 to approximate the solution to our usual example, 


y=y-Psl, 0<t<2, yO=05. 
Solution The difference equations produced from the various formulas are 
Midpoint method: = wj4, = 1.22w; — 0.0088i7 — 0.008: + 0.218; 
Modified Euler method: = wj4; = 1.22w; — 0.0088i7 — 0.008i + 0.216, 
for each i = 0,1,...,9. The first two steps of these methods give 
Midpoint method: w, = 1.22(0.5) — 0.0088(0)? — 0.008(0) + 0.218 = 0.828; 
Modified Euler method: wy, = 1.22(0.5) — 0.0088(0)? — 0.008(0) + 0.216 = 0.826, 
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Table 5.6 


Karl Heun (1859-1929) was a 
professor at the Technical 
University of Karlsruhe. He 
introduced this technique in a 
paper published in 1900. [Heu] 


Illustration 


5.4 Runge-Kutta Methods 287 


and 
Midpoint method: w 2 = 1.22(0.828) — 0.0088(0.2)* — 0.008(0.2) + 0.218 
= 1.21136; 
Modified Euler method: w2 = 1.22(0.826) — 0.0088(0.2)* — 0.008(0.2) + 0.216 


= 1.20692, 

Table 5.6 lists all the results of the calculations. For this problem, the Midpoint method 

is superior to the Modified Euler method. a 
Midpoint Modified Euler 
t y(t) Method Error Method Error 

0.0 0.5000000 0.5000000 0 0.5000000 0 
0.2 0.8292986 0.8280000 0.0012986 0.8260000 0.0032986 
0.4 1.2140877 1.2113600 0.0027277 1.2069200 0.0071677 
0.6 1.6489406 1.6446592 0.00428 14 1.6372424 0.0116982 
0.8 2.1272295 2.1212842 0.0059453 2.1102357 0.0169938 
1.0 2.6408591 2.633 1668 0.0076923 2.6176876 0.0231715 
1.2 3.1799415 3.1704634 0.009478 1 3.1495789 0.0303627 
1.4 3.7324000 3.7211654 0.0112346 3.6936862 0.0387138 
1.6 4.2834838 4.2706218 0.0128620 4.2350972 0.0483866 
1.8 4.8151763 4.8009586 0.0142177 4.7556185 0.0595577 
2.0 5.3054720 5.2903695 0.0151025 5.2330546 0.0724173 


Runge-Kutta methods are also options within the Maple command Initial ValueProblem. 
The form and output for Runge-Kutta methods are the same as available under the Euler’s 
and Taylor’s methods, as discussed in Sections 5.1 and 5.2. 


Higher-Order Runge-Kutta Methods 
The term T(t, y) can be approximated with error O(h*) by an expression of the form 
fe+oaryt+ di ft+o2y+ df(t), 


involving four parameters, the algebra involved in the determination of a1, 5), @2, and 43 is 
quite involved. The most common O(h*) is Heun’s method, given by 


Wo =a 
wiser = wit 2 (fG.wi) +3f (it 4wit SF (4+ 4.ui+ 4fG.wa))), 
for i=0,1,...,N—-1. 


Applying Heun’s method with N = 10, h = 0.2, t; = 0.2i, and wo = 0.5 to approximate 
the solution to our usual example, 


y=y-P+l, O<t<2, y0)=05. 
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gives the values in Table 5.7. Note the decreased error throughout the range over the Midpoint 
and Modified Euler approximations. 


Table 5.7 cans 
t y(t) Method Error 

0.0 0.5000000 0.5000000 0 

0.2 0.8292986 0.8292444 0.0000542 
0.4 1.2140877 1.2139750 0.0001 127 
0.6 1.6489406 1.6487659 0.0001747 
0.8 2.1272295 2.1269905 0.0002390 
1.0 2.6408591 2.6405555 0.0003035 
1.2 3.1799415 3.1795763 0.0003653 
1.4 3.7324000 3.7319803 0.0004197 
1.6 4.2834838 4.2830230 0.0004608 
1.8 4.8151763 4.8146966 0.0004797 
2.0 5.3054720 5.3050072 0.0004648 


Runge-Kutta methods of order three are not generally used. The most common Runge- 
Kutta method in use is of order four in difference-equation form, is given by the following. 


Runge-Kutta Order Four 


wo = a, 


ki =hf(G, wi), 
h 1 
ky =hf at 5.wit ok > 


2 
ky = hf (ti41, wi + ks), 


h 1 
kg =hf (: ae 5 wi + sha), 


1 
Witt = Wit aud + 2ky + 2k3 + ky), 


for each i = 0,1,...,N — 1. This method has local truncation error O(h*), provided the 
solution y(t) has five continuous derivatives. We introduce the notation kj, k2,k3,k4 into 
the method is to eliminate the need for successive nesting in the second variable of f(t, y). 
Exercise 32 shows how complicated this nesting becomes. 

Algorithm 5.2 implements the Runge-Kutta method of order four. 


Runge-Kutta (Order Four) 
To approximate the solution of the initial-value problem 
y=f@y), ast<b, ya=a, 


at (N + 1) equally spaced numbers in the interval [a, b]: 


INPUT endpoints a, b; integer N; initial condition a. 


OUTPUT approximation w to y at the (N + 1) values of f. 
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Step 1 Seth=(b—a)/N; 
t=a; 
wW=aQ; 
OUTPUT (t, w). 
Step 2 Fori=1,2,...,N do Steps 3-5. 


Step3 SetK, =hf(t,w); 
Ky =hf(t+h/2,w + K,/2); 
K3 =hf(t+h/2,w + K2/2); 
Ky =hf(t+h, w + K3). 
Step 4 Setw=w-+(K, + 2Ky + 2K3 + K4)/6; (Compute w;.) 
t =a-+ ih. (Compute tj.) 
Step 5 OUTPUT (t, w). 


Step 6 STOP. P| 


Example 3 Use the Runge-Kutta method of order four with h = 0.2, N = 10, and ¢; = 0.2i to obtain 
approximations to the solution of the initial-value problem 


y=y-P +l, 02f=2, 9O=05. 
Solution The approximation to y(0.2) is obtained by 
wo = 0.5 
k; = 0.2 f (0,0.5) = 0.2(1.5) = 0.3 
ky = 0.2 f (0.1, 0.65) = 0.328 
kz = 0.2 f (0.1, 0.664) = 0.3308 
ky = 0.2 f (0.2, 0.8308) = 0.35816 


1 
w; =0.5+ 53 + 2(0.328) + 2(0.3308) + 0.35816) = 0.8292933. 


The remaining results and their errors are listed in Table 5.8. a 


Table 5.8 Runge-Kutta 
Exact Order Four Error 
tj yi = y(t) Wi lyi — wil 

0.0 0.5000000 0.5000000 0 

0.2 0.8292986 0.8292933 0.0000053 
0.4 1.2140877 1.2140762 0.00001 14 
0.6 1.6489406 1.6489220 0.0000186 
0.8 2.1272295 2.1272027 0.0000269 
1.0 2.6408591 2.6408227 0.0000364 
1.2 3.1799415 3.1798942 0.0000474 
1.4 3.7324000 3.7323401 0.0000599 
1.6 4.2834838 4.2834095 0.0000743 
1.8 4.8151763 4.8150857 0.0000906 
2.0 5.3054720 5.3053630 0.0001089 
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Table 5.9 


Illustration 


Initial-Value Problems for Ordinary Differential Equations 


To obtain Runge-Kutta order 4 method results with Initial ValueProblem use the option 
method = rungekutta, submethod = rk4. The results produced from the following call for 
out standard example problem agree with those in Table 5.6. 


C := InitialValueProblem(deq, y(O) = 0.5,t = 2,method = rungekutta, submethod = 
rk4, numsteps = 10, output = information, digits = 8) 


Computational Comparisons 


The main computational effort in applying the Runge-Kutta methods is the evaluation of f. 
In the second-order methods, the local truncation error is O(h7), and the cost is two function 
evaluations per step. The Runge-Kutta method of order four requires 4 evaluations per step, 
and the local truncation error is O(h*). Butcher (see [But] for a summary) has established the 
relationship between the number of evaluations per step and the order of the local truncation 
error shown in Table 5.9. This table indicates why the methods of order less than five with 
smaller step size are used in preference to the higher-order methods using a larger step size. 


Evaluations per step 2 3 4 5$<n<7 8<n<9 10<n 


Best possible local 


‘ O(h’) O(h?) O(h*) O(h""!) O(h"~’) O(h"-3) 
truncation error 


One measure of comparing the lower-order Runge-Kutta methods is described as 
follows: 


e The Runge-Kutta method of order four requires four evaluations per step, whereas Euler’s 
method requires only one evaluation. Hence if the Runge-Kutta method of order four is 
to be superior it should give more accurate answers than Euler’s method with one-fourth 
the step size. Similarly, if the Runge-Kutta method of order four is to be superior to the 
second-order Runge-Kutta methods, which require two evaluations per step, it should 
give more accuracy with step size h than a second-order method with step size h/2. 


The following illustrates the superiority of the Runge-Kutta fourth-order method by 
this measure for the initial-value problem that we have been considering. 


For the problem 
y=y-P +l, O<ts2, y0)=05, 


Euler’s method with h = 0.025, the Midpoint method with h = 0.05, and the Runge- 
Kutta fourth-order method with h = 0.1 are compared at the common mesh points of these 
methods 0.1, 0.2, 0.3, 0.4, and 0.5. Each of these techniques requires 20 function evaluations 
to determine the values listed in Table 5.10 to approximate y(0.5). In this example, the 
fourth-order method is clearly superior. 
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Table 5.10 Modified Runge-Kutta 
Euler Euler Order Four 
t Exact h = 0.025 h = 0.05 h=0.1 

0.0 0.5000000 0.5000000 0.5000000 0.5000000 

0.1 0.6574145 0.6554982 0.6573085 0.6574144 

0.2 0.8292986 0.8253385 0.8290778 0.8292983 

0.3 1.0150706 1.0089334 1.0147254 1.0150701 

0.4 1.2140877 1.2056345 1.2136079 1.2140869 

0.5 1.4256394 1.4147264 1.4250141 1.4256384 


EXERCISE SET 54 


1. Use the Modified Euler method to approximate the solutions to each of the following initial-value 
problems, and compare the results to the actual values. 


a. y — —2y, O<t<1, y(O)=0, withh = 0.5; actual solution y(t) = tte - xen + 
ee. 


25 
bh y=1+(¢- y), 2<t<3, y(2)=1, withh=0.5; actual solution y(t) = t+ xi. 
«e y=I1+y/t, 1<t<2, yO) =2, withh =0.25; actual solution y(t) = tlnt + 2r. 
d. y =cos2¢+sin3t, O<+t< 1, yO) = 1, with h = 0.25; actual solution y(t) = 
5 sin 2t — 5 cos 3¢ + :. 
2. Use the Modified Euler method to approximate the solutions to each of the following initial-value 
problems, and compare the results to the actual values. 


a y=e, O<r<1, y(O)=1, withh =0.5; actual solution y(t) = In(e’ + e— 1). 


1+t 
y= —. 1<r<2, yl) =2, withh=0.5; actual solution y(t) = Vf + 2t+6-1. 

y 
«e y =-ytpn'?, 2< +t < 3, y(2) = 2, with A = 0.25; actual solution y(t) = 


2 
(1 —2+ Vee") . 


d. y =r*(sin2t—2ry), 1<t <2, y(1) = 2, with h = 0.25; actual solution y(t) = 
51-7(4 + cos 2 — cos 21). 
3. Use the Modified Euler method to approximate the solutions to each of the following initial-value 
problems, and compare the results to the actual values. 


a y=y/t—Q/t)?, 1<t<2, y(1)=1,withh =0.1; actual solution y(t) =¢/(1 + Ind). 
b. y=1+y/t+ (y/t)’, 1<r<3, yC) =0,withh = 0.2; actual solution y(t) = ttan(nf). 


e y=-(V+)Do0+3), O<t<2, yO) = —2, withh = 0.2; actual solution y(t) = 
—34+21+e%)!, 
d. y' =—Sy+5P+24, O<r<1, y(0)= +4, withh =0.1; actual solution y(t) = + te. 
4. Use the Modified Euler method to approximate the solutions to each of the following initial-value 
problems, and compare the results to the actual values. 


,_ 2-2 2r+1 


a. = Pal? O0<t<1, y(@O)=1, withh = 0.1; actual solution y(t) = Pat’ 
2 
y = 7 . — 
b. —— ; 1<t<2,; 1) = —dn2)~!, with h = 0.1; actual solut t) = ———_.. 
y i+% <t< y(1) (In 2)~*, wi actual solution y(t) ing+D 
2t 
ec y=O°+y)/t, 1<t<3, yd) = —2, withh = 0.2; actual solution y(t) = io 


d. y=-ty+4t/y, O<t<1, y(0)=1, withh =0.1; actual solution y(t) = V4 — 3e-”. 
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10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 


18. 


19. 
20. 
21. 
22. 
23. 
24. 
25. 


26. 


27. 


28. 


Initial-Value Problems for Ordinary Differential Equations 


Repeat Exercise | using the Midpoint method. 

Repeat Exercise 2 using the Midpoint method. 

Repeat Exercise 3 using the Midpoint method. 

Repeat Exercise 4 using the Midpoint method. 

Repeat Exercise | using Heun’s method. 

Repeat Exercise 2 using Heun’s method. 

Repeat Exercise 3 using Heun’s method. 

Repeat Exercise 4 using Heun’s method. 

Repeat Exercise 1 using the Runge-Kutta method of order four. 
Repeat Exercise 2 using the Runge-Kutta method of order four. 
Repeat Exercise 3 using the Runge-Kutta method of order four. 
Repeat Exercise 4 using the Runge-Kutta method of order four. 


Use the results of Exercise 3 and linear interpolation to approximate values of y(t), and compare the 
results to the actual values. 


a. y(1.25) and y(1.93) b.  y(2.1) and y(2.75) 
ce. y(1.3) and y(1.93) d. _y(0.54) and y(0.94) 


Use the results of Exercise 4 and linear interpolation to approximate values of y(t), and compare the 
results to the actual values. 


a. (0.54) and y(0.94) b.  y(1.25) and y(1.93) 
ce.  y(1.3) and y(2.93) d. (0.54) and y(0.94) 
Repeat Exercise 17 using the results of Exercise 7. 

Repeat Exercise 18 using the results of Exercise 8. 

Repeat Exercise 17 using the results of Exercise 11. 

Repeat Exercise 18 using the results of Exercise 12. 

Repeat Exercise 17 using the results of Exercise 15. 

Repeat Exercise 18 using the results of Exercise 16. 


Use the results of Exercise 15 and Cubic Hermite interpolation to approximate values of y(t), and 
compare the approximations to the actual values. 

a. y(1.25) and y(1.93) b.  y(2.1) and y(2.75) 

ce. y(1.3) and y(1.93) d. (0.54) and y(0.94) 

Use the results of Exercise 16 and Cubic Hermite interpolation to approximate values of y(t), and 
compare the approximations to the actual values. 

a. y(0.54) and y(0.94) b.  y(1.25) and y(1.93) 

ce. y(1.3) and y(2.93) d. (0.54) and y(0.94) 

Show that the Midpoint method and the Modified Euler method give the same approximations to the 
initial-value problem 


y=-ytrtl, O<r<1, yO=1, 


for any choice of h. Why is this true? 


Water flows from an inverted conical tank with circular orifice at the rate 


dx JX 
— = -0.627r /2g ~— 
dt ONT Pe pea 


where r is the radius of the orifice, x is the height of the liquid level from the vertex of the cone, 
and A(x) is the area of the cross section of the tank x units above the orifice. Suppose r = 0.1 ft, 
g = 32.1 ft/s’, and the tank has an initial water level of 8 ft and initial volume of 512(7/3) ft*. Use 
the Runge-Kutta method of order four to find the following. 

a. The water level after 10 min with h = 20s 


b. When the tank will be empty, to within 1 min. 
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29. The irreversible chemical reaction in which two molecules of solid potassium dichromate (K2Cr)07), 
two molecules of water (H2O), and three atoms of solid sulfur (S) combine to yield three molecules of 
the gas sulfur dioxide (SO;), four molecules of solid potassium hydroxide (KOH), and two molecules 
of solid chromic oxide (Cr.03) can be represented symbolically by the stoichiometric equation: 


2K yCr.O7 + 2H,O + 3S —> 4KOH + 2Cr03 + 3SQd. 


If 2, molecules of K2Cr2O7, nz molecules of H2O, and n3 molecules of S are originally available, the 
following differential equation describes the amount x(t) of KOH after time ft: 


dx x\? x\? 3x\3 
—=k(n, Ny n3 ; 
dt 2 2; 4 


where k is the velocity constant of the reaction. If k = 6.22 x 107!9,n; = m = 2 x 10°, and 
n3 = 3 x 10°, use the Runge-Kutta method of order four to determine how many units of potassium 
hydroxide will have been formed after 0.2 s? 

30. Show that the difference method 


Wo = a, 
Winn = Wit af (ti, Wi) + of (i + 2, W + bof (G, wi), 


for each i = 0,1,...,N — 1, cannot have local truncation error O(h*) for any choice of constants 
a1, a2,Q2, and bo. 

31. Show that Heun’s method can be expressed in difference form, similar to that of the Runge-Kutta 
method of order four, as 


Wo =a, 


ki =hf (Gi, wi), 


h 1 
ky =hf (: + 3° w; + sh) , 


BF (ea ee 
= i 3 Wye ; 
3 3 32 

1 
Wier = wi + 7 Oh + Ska), 


foreachi=0,1,...,N—1. 
32. The Runge-Kutta method of order four can be written in the form 


Wo = a, 
h h 
Witt = Wit gf wi) + PAG + ah, w; + dhf (ti, wi)) 
h 
+ zi + argh, w; + dohf (ti + y2h, wi + y3hf (ti, wi))) 
h 
+ gi Mi + azh, wi + d3hf (t + yah, wi t+ yshf (ti + Yoh, wi + vhf (ti, wi)))). 


Find the values of the constants 


1, 2, 3, 51, 62, 53, ¥2, ¥3, 4, V5, Yo, and y7. 


| 5.5 Error Control and the Runge-Kutta-Fehlberg Method 


In Section 4.6 we saw that the appropriate use of varying step sizes for integral approxima- 
tions produced efficient methods. In itself, this might not be sufficient to favor these methods 
due to the increased complication of applying them. However, they have another feature 
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You might like to review the 
Adaptive Quadrature material in 
Section 4.6 before considering 
this material. 


Initial-Value Problems for Ordinary Differential Equations 


that makes them worthwhile. They incorporate in the step-size procedure an estimate of 
the truncation error that does not require the approximation of the higher derivatives of the 
function. These methods are called adaptive because they adapt the number and position 
of the nodes used in the approximation to ensure that the truncation error is kept within a 
specified bound. 

There is a close connection between the problem of approximating the value of a 
definite integral and that of approximating the solution to an initial-value problem. It is 
not surprising, then, that there are adaptive methods for approximating the solutions to 
initial-value problems and that these methods are not only efficient, but also incorporate the 
control of error. 

Any one-step method for approximating the solution, y(t), of the initial-value problem 


y=f(t,y), fora<t<b, withy(a)=a 
can be expressed in the form 
Wisi = Withd(t, wih), fori=0,1,...,N—1, 


for some function @. 
An ideal difference-equation method 


Wii = WithPC, wih), i=0,1,...,.N—1, 
for approximating the solution, y(t), to the initial-value problem 


y=fhy)., astra sQ=—w, 


would have the property that, given a tolerance ¢ > 0, a minimal number of mesh points 
could be used to ensure that the global error, | y(¢;) — w;|, did not exceed ¢ for any i = 
0, 1,...,N. Having a minimal number of mesh points and also controlling the global error 
of a difference method is, not surprisingly, inconsistent with the points being equally spaced 
in the interval. In this section we examine techniques used to control the error of a difference- 
equation method in an efficient manner by the appropriate choice of mesh points. 

Although we cannot generally determine the global error of a method, we will see 
in Section 5.10 that there is a close connection between the local truncation error and the 
global error. By using methods of differing order we can predict the local truncation error 
and, using this prediction, choose a step size that will keep it and the global error in check. 

To illustrate the technique, suppose that we have two approximation techniques. The 
first is obtained from an nth-order Taylor method of the form 


y(tiz1) = Vt) + ho. y@G),h) + OC"), 
and produces approximations with local truncation error t;+1(4) = O(h"). It is given by 
wo =a 
Wi41 = Withd(t,w;,h), fori> 0. 


In general, the method is generated by applying a Runge-Kutta modification to the Taylor 
method, but the specific derivation is unimportant. 

The second method is similar but one order higher; it comes from an (7 + 1)st-order 
Taylor method of the form 


y(tin1) = (ti) + ho(t;, y(t;),h) ae on"), 
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and produces approximations with local truncation error 7;4;(2) = O(h"*!). It is given by 
Wo =a 
Wiz) = W; +AG(t;, w;,h), fori > 0. 
We first make the assumption that w; © y(t;) © w; and choose a fixed step size h to 
generate the approximations w;,; and Ww; to y(¢;,,). Then 


y(ti41) — yh) 
h 


Cit1) — Wi 
= ean — b(t, wi, h) 


= y(ti-1) — [Wi + hd, wi, A)] 
h 


Ti41(h) = — o(tj, y(t), h) 


1 
= 7 Ot) — Wi41)- 


In a similar manner, we have 
: 1 es 
Ti41(h) = pO ie) — Wi41)- 


As a consequence, we have 


1 
Ti41(h) = 7 OG) — Wi41) 


1 
= 7 loin) — Wit) + Wi — wWi41)] 


- 1g 
= T41(h) + 7 witl — Wi+1)- 
But 14.1 (A) is O(n") and 741 (A) is O(h"*'), so the significant portion of 7,4 (4) must come 
from 
| ae 
5 (Witt — Wi41)- 


This gives us an easily computed approximation for the local truncation error of the O(h”) 
method: 


Ti41(h) © i (Wi41 — Wi41)- 


The object, however, is not simply to estimate the local truncation error but to adjust 
the step size to keep it within a specified bound. To do this we now assume that since T;+ (1) 
is O(h"), anumber K, independent of h, exists with 


Tiz1(h) © Kh". 


Then the local truncation error produced by applying the nth-order method with a new step 
size gh can be estimated using the original approximations w;+; and W741: 


A 


n n n n q ~ 
Tiz1 (gh) © K(qh)" = q"(Kh") & q'tiyi(h) © yp (wit — Wi41)- 


To bound 7;,; (gh) by ¢, we choose q so that 


i 
op with — wisil © |ti41(gh)| < €; 
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Erwin Fehlberg developed this 
and other error control techniques 
while working for the NASA 
facility in Huntsville, Alabama 
during the 1960s. He received 
the Exceptional Scientific 
Achievement Medal from NASA 
in 1969. 


Initial-Value Problems for Ordinary Differential Equations 


that is, so that 


( eh . 
gs (——_—} . (5.22) 


|Wi41 — Wi41l 


Runge-Kutta-Fehlberg Method 


One popular technique that uses Inequality (5.22) for error control is the Runge-Kutta- 
Fehlberg method. (See [Fe].) This technique uses a Runge-Kutta method with local trun- 
cation error of order five, 

16 6656 28561 9 


2 
en k k k ke. 
Mi We rags! fogs? © 564g 50° 55 


to estimate the local error in a Runge-Kutta method of order four given by 


25 1408 2197 1 


i+] = Wj k k k ; 
ne ON 56 9565 aa GS 
where the coefficient equations are 
ki =hf (t;, wi), 
h 1 
ky =hf ft awit gk , 
3h 4 9 
ky =hf (t; W; k ko), 
3 f(n+ Zon t Sh + She) 
Ear: 12h |, 1932, 7200, , 7296 
_— ee? gioy tay * Diag) 
439 3680 845 
ks =hf (t;+h,w; ky — 8k ka 
2 r( a ame ai am) 
ree ee 8 ky — 35g, 1859, 
6 SAE 5. wi — aah + Aka — sees + Figg — a's J 


An advantage to this method is that only six evaluations of f are required per step. Arbitrary 
Runge-Kutta methods of orders four and five used together (see Table 5.9 on page 290) 
require at least four evaluations of f for the fourth-order method and an additional six for 
the fifth-order method, for a total of at least ten function evaluations. So the Runge-Kutta- 
Fehlberg method has at least a 40% decrease in the number of function evaluations over the 
use of a pair of arbitrary fourth- and fifth-order methods. 

In the error-control theory, an initial value of / at the ith step is used to find the first values 
of w;+) and w;41, which leads to the determination of q for that step, and then the calculations 
were repeated. This procedure requires twice the number of function evaluations per step 
as without the error control. In practice, the value of qg to be used is chosen somewhat 
differently in order to make the increased function-evaluation cost worthwhile. The value 
of q determined at the ith step is used for two purposes: 


e When q < |: to reject the initial choice of h at the ith step and repeat the calculations 
using gh, and 


e When g > 1: to accept the computed value at the ith step using the step size h, but change 
the step size to gh for the (i + 1)st step. 
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Because of the penalty in terms of function evaluations that must be paid if the steps are 
repeated, g tends to be chosen conservatively. In fact, for the Runge-Kutta-Fehlberg method 
with n = 4, acommon choice is 


h 1/4 h 1/4 
q= Gc) = 0.84 =) 
2|Wi+1 = Wi+1l |Wi+1 — wi+1! 


In Algorithm 5.3 for the Runge-Kutta-Fehlberg method, Step 9 is added to eliminate 
large modifications in step size. This is done to avoid spending too much time with small step 
sizes in regions with irregularities in the derivatives of y, and to avoid large step sizes, which 
can result in skipping sensitive regions between the steps. The step-size increase procedure 
could be omitted completely from the algorithm, and the step-size decrease procedure used 
only when needed to bring the error under control. 


Runge-Kutta-Fehlberg 
To approximate the solution of the initial-value problem 


y=fy), ax<t<b, y@=a, 
with local truncation error within a given tolerance: 
INPUT endpoints a, b; initial condition a; tolerance TOL; maximum step size hmax; 
minimum step size hmin. 


OUTPUT 1, w,h where w approximates y(t) and the step size h was used, or a message 
that the minimum step size was exceeded. 


Step 7 Sett=a; 
w=a; 
h = hmax; 
FLAG = 1; 
OUTPUT (ft, w). 


Step 2. While (FLAG = 1) do Steps 3-11. 
Step 3 Set K, =hf (t,w); 
=hf (t+ th,w + 4Ki); 


K3 =hf (t+ jhw+ $Ki t+ $k); 

Ki hf (+ Brow + PBK, — Bae + BK) 

Ks=hf (t +h,w + sek — 8K2 + Ks — Far Ka); 

Ko hf (+ Pia — BKi + 2K — BK + BOK, — Hs) 
Step 4 Set R = jl zqpKi — giqsK3 — FepapKa + sg Ks + Kol. 


(Note: R= j|Wis1 — wisi!) 
Step 5 If R < TOL then do Steps 6 and 7. 
Step 6 Sett=t+h; (Approximation accepted.) 


_ 1408 297 od 
w=wt soKi + 35653 + Figg Ka — 5Ks- 
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, Step 7 OUTPUT (t, w,h). 
Step 8 Set d = 0.84(TOL/R)'4. 
Step 9 If5 <0.1 thenseth=0.1h 


else if 6 > 4 then set h = 4h 
else seth = dh. (Calculate new h.) 


Step 10 Ifh > hmax then set h = hmax. 


Step 11 Ift > b then set FLAG = 0 
else ift+h > bthenseth=b—-t 
else if h < hmin then 
set FLAG = 0; 
OUTPUT (‘minimum h exceeded’). 
(Procedure completed unsuccessfully.) 


Step 12. (The procedure is complete.) 
STOP. - 


Example 1 Use the Runge-Kutta-Fehlberg method with a tolerance TOL = 10~>, a maximum step size 
hmax = 0.25, and a minimum step size hmin = 0.01 to approximate the solution to the 
initial-value problem 


y=y-P+l, O<t<2, y0)=05, 
and compare the results with the exact solution y(t) = (t + 1)? — 0.5e". 


Solution We will work through the first step of the calculations and then apply Algorithm 
5.3 to determine the remaining results. The initial condition gives fo = 0 and wo = 0.5. To 
determine w, using h = 0.25, the maximum allowable stepsize, we compute 


ky =f (to, wo) = 0.25 (0.5 — 0? + 1) = 0.375; 


1 1 1 1 
ky =hf @ + al wo + ah) = 0.25 (4025.05 + 70315) = 0.3974609; 


3 3 9 
ka =h t —h, k 
3 F(a 5 wo + 25 1+ sok ») 


3 9 
= 0.25 (o.08srs, 0.5 + —0.375 + 5303974609) = 0.4095383; 


32 32 
12 1932. 7200. 7296 
fog ay ea 
ony @ + 73! 0 + 5597"! — 2107 i) 
1932 7200 7296 
= 0.25 ( 0.2307692,0.5 + 0.375 — 5——-0.3974609 + 5—-0.4095383 
2197 2197 2197 
— 04584971; 
439 3680, 845 
ks =hf @ +h, wo + a6"! — 8ko + 33 3 - aa) 


439 3680 845 
= 0.25 (0 25,0.5 + 7162" .375 — 8(0.3974609) + 313 2: 4095383 — 704°" 4584911) 


= 0.4658452; 
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1 8 3544. «1859s 
a a ee ene ae k 
: F (n+ ro agi + — 55653 + aoa — 20 s} 


8 3544 
= 0.25 { 0.125, 0.5 — —0.375 + 2(0.3974609) — ——0.4095383 
27 2565 


1859 11 
+ ——0.4584971 — —0.4658452 
4104 40 


= 0.4204789. 


The two approximations to y(0.25) are then found to be 


i =p 28g, O06, , B5OL, 9, 2, 
W,= 
pee ga 19895-56490 SO SS 
16 6656 28561 9 
= 0.5 + —-0.375 + ———0.4095383 + —~——0.4584971 — —-0.4658452 
135 12825 56430 50 


2 
=~ 0.4204789 
. 55 


= 0.9204870, 
and 
25 1408 2197 1 
= k k k k 
Wee aig | es ae 

=0.5+ = 0.375 + ea 4095383 + arg 4584971 Ly 4658452 

~~" 9167 2565 | 4104 ° 5 

= 0.9204886. 


This also implies that 


_ 1 Ji, _ 128 Le ee 

~ 0.25|360. 4275. 75240 ' 50 55° 

= 4} 0.375 — 422 9.4005383 — 212” 0.4584971 + L:0.4658452 + 2-0.4204789 
=" 360 4275 75240 50 55 

= (,00000621388, 


and 


— 0.84 ( =) = 0.84 ( See yo = 0,9461033291 
Fee) = —"°"*%\ 990000621388) 


Since gq < 1 we can accept the approximation 0.9204886 for y(0.25) but we should adjust 
the step size for the next iteration to h = 0.9461033291(0.25) * 0.2365258. However, 
only the leading 5 digits of this result would be expected to be accurate because R has only 
about 5 digits of accuracy. Because we are effectively subtracting the nearly equal numbers 
w; and w; when we compute R, there is a good likelihood of round-off error. This is an 
additional reason for being conservative when computing gq. 

The results from the algorithm are shown in Table 5.11. Increased accuracy has been 
used to ensure that the calculations are accurate to all listed places. The last two columns 
in Table 5.11 show the results of the fifth-order method. For small values of t, the error is 
less than the error in the fourth-order method, but the error exceeds that of the fourth-order 
method when f increases. a 
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Table 5.11 
RKF-4 RKF-5 
tj yi = yi) Wi h; Ri lyi — wl w; ly; — wil 

0) 0.5 0.5 0.5 

0.2500000 0.9204873 0.9204886 0.2500000 6.2 x 10~° 1.3 x 10-° 0.9204870 2.424 x 1077 
0.4865522 1.3964884 1.3964910 0.2365522 4.5 x 10-° 2.6 x 10~® 1.3964900 1.510 x 10~® 
0.7293332 1.9537446 1.9537488 0.2427810 4.3 x 10~° 4.2 x 10~° 1.9537477 3.136 x 10~° 
0.9793332 2.5864198 2.5864260 0.2500000 3.8 x 10-6 6.2 x 10~° 2.586425 1 5.242 x 10-6 
1.2293332 3.2604520 3.2604605 0.2500000 2.4 x 10-6 8.5 x 10-6 3.2604599 7.895 x 10-6 
1.4793332 3.9520844 3.9520955 0.2500000 7x 10-7 1.11 x 107> 3.9520954 1.096 x 10-> 
1.7293332 4.6308127 4.6308268 0.2500000 1.5 x 10-6 1.41 x 107-5 4.6308272 1.446 x 1075 
1.9793332 5.2574687 5.2574861 0.2500000 4.3 x 10-6 1.73 x 107-5 5.2574871 1.839 x 107-5 
2.0000000 5.3054720 5.3054896 0.0206668 1.77 x 107° 5.3054896 1.768 x 107> 


An implementation of the Runge-Kutta-Fehlberg method is also available in Maple 
using the Initial ValueProblem command. However, it differs from our presentation because 
it does not require the specification of a tolerance for the solution. For our example problem 
it is called with 


C := InitialValueProblem(deq, y(O) = 0.5,t = 2,method = rungekutta, submethod = 
rkf, numsteps = 10, output = information, digits = 8) 


As usual, the information is placed in a table that is accessed by double clicking on the 
output. The results can be printed in the method outlined in precious sections. 


EXERCISE SET 5.5 


1. 


Use the Runge-Kutta-Fehlberg method with tolerance TOL = 10~*, hmax = 0.25, and hmin = 0.05 
to approximate the solutions to the following initial-value problems. Compare the results to the actual 
values. 


a y=te¥—2y, O<t<1, y(O)=0; actual solution y(t) = tre* _ xem + xe 

b y=14(t—y)’, 2<t<3, y(2)=1; actual solutiony(t) =t+1/(1—0). 

«e y=l1+y/t, 1<t<2, yl) = 2; actual solution y(t) = tlnt + 2. 

d. y' =cos2t+sin3t, O<t<1, y(O) =1; actual solution y(t) = 5 sin 2t — + cos 3t+ z. 


Use the Runge-Kutta Fehlberg Algorithm with tolerance TOL = 10~* to approximate the solution to 
the following initial-value problems. 


a y=(y/t?+y/t, 1<t<1.2, yl) =1, with hmax = 0.05 and hmin = 0.02. 
by =sint+e‘’, O<t<1, y(O)=0, with hmax = 0.25 and hmin = 0.02. 

ce y=O°+y)/t, 1<t<3, yd) =—2, with hmax = 0.5 and hmin = 0.02. 

d. y=r, O<t<2, yO) =0, with hmax = 0.5 and hmin = 0.02. 


Use the Runge-Kutta-Fehlberg method with tolerance TOL = 10~°, hmax = 0.5, and hmin = 0.05 to 
approximate the solutions to the following initial-value problems. Compare the results to the actual 
values. 


a y'=y/t—(y/t)?, 1<t<4, y()=1; actual solution y(t) = ¢/(+Ind. 

b. y=1+4+y/t+(Q/0?, 1<t<3, y(1) =0; actual solution y(‘) = ftan(Inf). 

e« y =-(yt+)D~4+3), O<t<3, yO) = —2; actual solution y(t) = —3 +201 +e)!” 
da. yY=(t+2P)y~—-tn, O<r<2, yO= 53 actual solution y(t) = (3 + 2”? + 6e”)-!/2. 
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4. The Runge-Kutta-Verner method (see [Ve]) is based on the formulas 


3. 2375. 5 12 3 
i441 = i k k k, k Kk d 
ME Unt Tt cog ge gee age 


<n 244 8 Bp 2g 1B Be 
BEY ag tS ogaaee ga tone peng eg 


where 
ki =hf (t;, wi), 
ko =hf sel pe 
= iT 7 Wit 7s > 
2 1 6 6! 
Baie tie eh 
3 =hf | ti ps Wit aah + ak J, 
2h 5 8.5 
k. =h tj > i k k k > 
4 r( + 3 w sar 1 3 ats ») 
5h 165, 55, 425.— 85 
ks =hf (t; wy; k k ky), 
? ut ( wig Ee eo Gt a6 :) 
12 4015. 11 88 
ke =hf (tt; +h,w; ky — 8k k k k 
6 f(n+ wit = 1 8k + ay ks she+ Soks) 
h 8263 124. 643 81 2484 
ky =hf (+ k k ke ks)» 
7 r( + 75° — T5000"! + 752 — 680" — 250" * 10625 s) 
te = hf (4+ hew, + 20k g, — 3p, 4 297275 319, , 24068, 3850 
HN i > Wj . 
: 1720’ 43°° ° 52632 2322 * 84065 26703 ” 


The sixth-order method w;,, is used to estimate the error in the fifth-order method w;,;. Construct 
an algorithm similar to the Runge-Kutta-Fehlberg Algorithm, and repeat Exercise 3 using this new 
method. 

5. In the theory of the spread of contagious disease (see [Bal] or [Ba2]), a relatively elementary dif- 
ferential equation can be used to predict the number of infective individuals in the population at any 
time, provided appropriate simplification assumptions are made. In particular, let us assume that all 
individuals in a fixed population have an equally likely chance of being infected and once infected 
remain in that state. Suppose x(t) denotes the number of susceptible individuals at time ¢ and y(t) 
denotes the number of infectives. It is reasonable to assume that the rate at which the number of 
infectives changes is proportional to the product of x(t) and y(t) because the rate depends on both the 
number of infectives and the number of susceptibles present at that time. If the population is large 
enough to assume that x(t) and y(t) are continuous variables, the problem can be expressed 


y(t) = kx(t)y), 


where k is a constant and x(t) + y(t) = m, the total population. This equation can be rewritten 
involving only y(t) as 


yO =k(m— yO)y(0). 


a. Assuming that m = 100,000, y(0) = 1000, k = 2 x 10~°, and that time is measured in days, 
find an approximation to the number of infective individuals at the end of 30 days. 

b. The differential equation in part (a) is called a Bernoulli equation and it can be transformed into 
a linear differential equation in u(t) = (y(t))~!. Use this technique to find the exact solution to 
the equation, under the same assumptions as in part (a), and compare the true value of y(t) to 
the approximation given there. What is lim,_,. y(t) ? Does this agree with your intuition? 


6. In the previous exercise, all infected individuals remained in the population to spread the disease. 
A more realistic proposal is to introduce a third variable z(t) to represent the number of individuals 
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who are removed from the affected population at a given time ¢ by isolation, recovery and consequent 
immunity, or death. This quite naturally complicates the problem, but it can be shown (see [Ba2]) that 
an approximate solution can be given in the form 


x(t) =x(Oe""/2O and = y(t) = m— x(t) — z(t), 


where k, is the infective rate, ky is the removal rate, and z(t) is determined from the differential 
equation 


Z(t)=k (m — z(t) — atOje Moe) ; 


The authors are not aware of any technique for solving this problem directly, so a numerical procedure 
must be applied. Find an approximation to z(30), y(30), and x(30), assuming that m = 100,000, 
x(0) = 99,000, k, = 2 x 10~°, and ky = 107+. 


a 5.6 Multistep Methods 


The methods discussed to this point in the chapter are called one-step methods because the 
approximation for the mesh point ¢;,; involves information from only one of the previous 
mesh points, t;. Although these methods might use function evaluation information at points 
between ¢; and ¢;;, they do not retain that information for direct use in future approximations. 
All the information used by these methods is obtained within the subinterval over which 
the solution is being approximated. 

The approximate solution is available at each of the mesh points fo, t),... , f; before the 
approximation at f;;1 is obtained, and because the error |w,; — y(t;)| tends to increase with j, 
so it seems reasonable to develop methods that use these more accurate previous data when 
approximating the solution at f;41. 

Methods using the approximation at more than one previous mesh point to determine 
the approximation at the next point are called multistep methods. The precise definition of 
these methods follows, together with the definition of the two types of multistep methods. 


Definition 5.14 Anm-step multistep method for solving the initial-value problem 
y=f(y), a<t<b, y@=a, (5.23) 


has a difference equation for finding the approximation w+; at the mesh point ¢;;; repre- 
sented by the following equation, where m is an integer greater than 1: 


Wi4t = Am—-1Wj + Gn—-2Wj-1 + +++ + AoWi+41-m 
+ hlbn f Git, Wit) + Omi f Gi, wi) 
ace bof (tit1—m> Witi-m)I, (5.24) 


fori = m-—1,m,...,N —1, where h = (b—a)/N, the ao, a),...,@m_1 and bo, by,..., bm 
are constants, and the starting values 


Wo = a, WwW, =, W2 = 2, ea) Wm-1 = Am-1 


are specified. a 


When Jb, = 0 the method is called explicit, or open, because Eq. (5.24) then gives 
wzi+1 explicitly in terms of previously determined values. When b,, 4 0 the method is called 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


The Adams-Bashforth techniques 
are due to John Couch Adams 
(1819-1892), who did significant 
work in mathematics and 
astronomy. He developed these 
numerical techniques to 
approximate the solution of a 
fluid-flow problem posed by 
Bashforth. 


Forest Ray Moulton (1872-1952) 
was in charge of ballistics at the 
Aberdeen Proving Grounds in 
Maryland during World War I. 
He was a prolific author, writing 
numerous books in mathematics 
and astronomy, and developed 
improved multistep methods for 
solving ballistic equations. 


Example 1 
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implicit, or closed, because w;;; occurs on both sides of Eq. (5.243), so w;+1 is specified 
only implicitly. 
For example, the equations 


W3 = 3, 


h 
Wi = Wi + mst Gs wi) — 59 f (ti-1, Wi-1) + 37 f (t-2, Wi-2) — Of (ti-3, Wi-3)], 
(5.25) 


for eachi = 3,4,...,N—1, define an explicit four-step method known as the fourth-order 
Adams-Bashforth technique. The equations 
Wo = a, 


Wi =, W2= 2, 


h 
Witt = Wit yg OF Gin, wit1) + 19f Gj, wi) — 5 f G1, wi-1) + f(i-2, wi-2)], (5.26) 


foreachi = 2,3,...,N—1, define an implicit three-step method known as the fourth-order 
Adams-Moulton technique. 

The starting values in either (5.25) or (5.26) must be specified, generally by assuming 
wo = a and generating the remaining values by either a Runge-Kutta or Taylor method. We 
will see that the implicit methods are generally more accurate then the explicit methods, 
but to apply an implicit method such as (5.25) directly, we must solve the implicit equation 
for w;+1. This is not always possible,and even when it can be done the solution for w)+1 
may not be unique. 


In Example 3 of Section 5.4 (see Table 5.8 on page 289) we used the Runge-Kutta method 
of order four with h = 0.2 to approximate the solutions to the initial value problem 


y=y-P+l, O0<t<2, y0)=05. 


The first four approximations were found to be y(0) = wo = 0.5, y(0.2) * wy; = 
0.8292933, y(0.4) © w2 = 1.2140762, and (0.6) © w3 = 1.6489220. Use these as 
starting values for the fourth-order Adams-Bashforth method to compute new approxima- 
tions for y(0.8) and y(1.0), and compare these new approximations to those produced by 
the Runge-Kutta method of order four. 


Solution For the fourth-order Adams-Bashforth we have 


2, 
y(0.8) © wa = w3+ F250, w3) — 59 f (0.4, w2) + 37 f (0.2, w1) — 9 f (0, wo)) 


0.2 
= 1.6489220 + ag OT OS, 1.6489220) — 59 f (0.4, 1.2140762) 


+ 37 f (0.2, 0.8292933) — 9 f (0, 0.5)) 

= 1.6489220 + 0.0083333(55(2.2889220) — 59(2.0540762) 
+ 37(1.7892933) — 9(1.5)) 

= 2.1272892, 
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Adams was particularly 
interested in the using his ability 
for accurate numerical 
calculations to investigate the 
orbits of the planets. He predicted 
the existence of Neptune by 
analyzing the irregularities in the 
planet Uranus, and developed 
various numerical integration 
techniques to assist in the 
approximation of the solution of 
differential equations. 


Initial-Value Problems for Ordinary Differential Equations 


and 


y(1.0) © ws = wa + (55 f (0.8, wa) — 59 f (0.6, w3) + 37 f (0.4, we) — 9 f (0.2, w1)) 


0.2 
= 2.1272892 + aa O8, 2.1272892) — 59 f (0.6, 1.6489220) 


+ 37f (0.4, 1.2140762) — 9 f (0.2, 0.8292933)) 
= 2.1272892 + 0.0083333(55(2.4872892) — 59(2.2889220) 
+ 37(2.0540762) — 9(1.7892933)) 
= 2.6410533, 
The error for these approximations at t = 0.8 and t = 1.0 are, respectively 
|2.1272295 — 2.1272892| = 5.97 x 107° and |2.6410533 — 2.6408591| = 1.94 x 107+. 
The corresponding Runge-Kutta approximations had errors 
|2.1272027 — 2.1272892| = 2.69 x 10-° and |2.6408227 — 2.6408591| = 3.64 x 10-°. 
a 


To begin the derivation of a multistep method, note that the solution to the initial-value 
problem 


y = f(t.y), 


if integrated over the interval [¢;, t;;], has the property that 


a<t<b, ya@=a, 


Ti+] ti+1 
v(tin) =a) = f y(t) a= | f(t, y(@)) dt. 
ti tj 
Consequently, 


Ti41 
vioin) = yay +f fe.v@o de (5.27) 
tj 
However we cannot integrate f(t, y(t)) without knowing y(t), the solution to the prob- 
lem, so we instead integrate an interpolating polynomial P(t) to f(t, y(t)), one that is 
determined by some of the previously obtained data points (f0, wo), (t1, W1), ---, (tj, Wi). 
When we assume, in addition, that y(t;) * w;, Eq. (5.27) becomes 


Ti+] 
V(tie1) © wi +f P(t) dt. (5.28) 
ti 

Although any form of the interpolating polynomial can be used for the derivation, it is most 
convenient to use the Newton backward-difference formula, because this form more easily 
incorporates the most recently calculated data. 

To derive an Adams-Bashforth explicit m-step technique, we form the backward- 
difference polynomial P,,_;(¢) through 


(ti, fti.yG))), Gar, fGi-1,9G-1))),---, titty f Git1—ms YGi41-m)))- 


Since P,,_;(t) is an interpolatory polynomial of degree m— 1, some number §; in (t:41—m, ti) 
exists with 


f(y¥O) = Pm) + 


(t— 4)(t 


ti1) +++ @ — tit1-m). 


f™ (&, ¥&)) 
m! 
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Introducing the variable substitution t = t; + sh, with dt = h ds, into P,,_;(t) and the error 
term implies that 


m—1 


ti+t fi+1 — 
t, y(t) dt = -»( ‘\v" ti, y(t;)) dt 
/ F(t.y(O) / x (7) vt Fea.ve) 


i i 


ti (m)(g. : 
Bg EN 2G ead 


tj m! 
m—1 1 a=% 
=o Vv‘ f(t.x@))h(- DS i ( ) as 
4 0 \k 
peri 


+ 


1 
/ s+ 1): 4m— DFME VE) a. 
0 


m! 


The integrals (—1)* de () ds for various values of k are easily evaluated and are listed in 
Table 5.12. For example, when k = 3, 


cy (5) a t (=s\(—9— 1)(s = 2) as 
0 3 0 1-2-3 


1 


=— | (s'+3s" 42s) ds 
A ' 1/9) 3 
fea (ee 3 2) 2/27) 2 
=|5+5 +e] (2) 8 
Table 5.12 As a consequence, 
; ti+1 1 5 P 
/ mee i fit,y@) dt=h [Fernsten + 5 VIG. 9G) + a. f(t. v@)) +-° | 
k 0 k a 
pyirtl 1 
+ / s(s+1)---(st+m—1f™&,y&)) ds. (5.29) 
0 1 ml! 0 
l 1 Because s(s + 1)---(s +m — 1) does not change sign on [0, 1], the Weighted Mean 
2 Value Theorem for Integrals can be used to deduce that for some number ju;, where tj+1~m < 
2 = Li < ti41, the error term in Eq. (5.29) becomes 
12 
3 pnt 1 
3 : / (5-41) --- (8m — DFE, WED) ds 
8 m! Jo 
251 
4 = pyrri (m) - : 1 
720 oe a? f s(s+1)---(s+m-—1) ds. 
288 Hence the error in (5.29) simplifies to 


1 —t 
n+! ¢ Cu, y(ues))(— 1)" i, () ra (5.30) 


But y(ti41) — y(t) = a f (t, y(t)) dt, so Eq. (5.27) can be written as 


1 5 
yin) = y(t) +A Eiger) + 3 V FG. yG)) + DY Suva) +: | 


m+1 p(n)... \\¢_4\m '(-s 
+e fe (ui y(ui)) (1) Ze ds. (5.31) 
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Example 2 


Definition 5.15 


Example 3 


Initial-Value Problems for Ordinary Differential Equations 


Use Eq. (5.31) with m = 3 to derive the three-step Adams-Bashforth technique. 


Solution We have 


1 5 
Vitis) © yt) +h Eiger + 5 V FG. YG) + BVrear)| 


1 
= y(ti) + Al Fans + lf Gi. y(G)) — f(ti-1,¥@-1))] 


5 
+ lf eG) — 2fi-1,¥@i-1)) + Fot-arytt-2)) 


h 
= yt) + p38 Fie VG) — 16f(-1,¥@i-1)) + 5 f Gi-2, y¥G_-2))]. 


The three-step Adams-Bashforth method is, consequently, 


w=, Wi=A, W2= a2, 


h 
Wit) = Wi + [p 3 Fis wi) — 16f (1, wi-1)] + 5 f (G-2, wi-2)], 


fori = 2,3,...,N—1. | 


Multistep methods can also be derived using Taylor series. An example of the proce- 
dure involved is considered in Exercise 12. A derivation using a Lagrange interpolating 
polynomial is discussed in Exercise 11. 

The local truncation error for multistep methods is defined analogously to that of 
one-step methods. As in the case of one-step methods, the local truncation error provides a 
measure of how the solution to the differential equation fails to solve the difference equation. 


If y(t) is the solution to the initial-value problem 
y=f(y), ast<b, ya=a, 
and 
Wit] = Gm—1Wi + Am—2Wi-1 + +++ + AgWi+1—m 
+ hlbn f (ti41, Witt) + Omi f (ti, Wi) + +++ + bof Git1—m, Wit1—-m)] 
is the (i+ 1)st step in a multistep method, the local truncation error at this step is 


ti —~ 4m— i) se > tit i—m 
et y(tit1) — Gm—-1y( a aoy(ti+1-m) (5.32) 


= Ln f (tit, Vi41)) sate bof Cist—m YCGi41—-m))I, 


for eachi = m—1,m,...,N—1. | 


Determine the local truncation error for the three-step Adams-Bashforth method derived in 
Example 2. 


Solution Considering the form of the error given in Eq. (5.30) and the appropriate entry in 
Table 5.12 gives 


te 3h4 
Wf f° (ui, y(ud)(—D? i ( >) ds = = Fir y(Hi)). 
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Using the fact that f © (2;, y(4;)) = y (;) and the difference equation derived in Example 


2, we have 
— tin) -—yG) 1 
Ti41(h) = i a 1p 23 Fin VG) — 16 f (t-1, yGi-1)) + Sf (h-2, y(ti-2))] 
1 [3h 3h3 
5 Fan | = OH, for some /4; € (t;-2,ti41)- a 


Adams-Bashforth Explicit Methods 
Some of the explicit multistep methods together with their required starting values and 


local truncation errors are as follows. The derivation of these techniques is similar to the 
procedure in Examples 2 and 3. 


Adams-Bashforth Two-Step Explicit Method 
wo =a, Wi =a, 
h 
Wit) = Wit a BSG: wi) — f(ti-1,wi-1)], (5.33) 


where i = 1,2,...,N — 1. The local truncation error is t;4;(4) = Sy" (uah’, for some 
Hi © (ti-1, ti41)- 


Adams-Bashforth Three-Step Explicit Method 
w=, Wi= a, W242, 
h 
Witt = Wit 7p 23 FG wi) — 16f (-1, wi-1) + Sf (4-2, Wi-2)], (5.34) 


where i = 2,3,...,N — 1. The local truncation error is t4;(4) = zy (nj )h>, for some 
Li © (ti-2, ti41). 


Adams-Bashforth Four-Step Explicit Method 


w=a, Wl, =), W2=O, W3= 43, (5.35) 


h 
Wit) = Wit ag of Gis wi) — 59 f (t-1, wi-1) + 37 f (ti-2, Wi-2) — 9 f (ti-3, Wi-3)], 


where i = 3,4,...,N — 1. The local truncation error is t4; (4) = Sty) (u)h*, for some 


Hi € (t;-3, ti41)- 
Adams-Bashforth Five-Step Explicit Method 


w=, WHA, W2=— 2, W3= 3, W4 = Ad, 


h 
Witt = Wi + 779 OO FG: wi) — 2774 f (ti-1, Wi-1) (5.36) 
+ 2616 f (t;_2, wi_2) — 1274 f (t_3, wi_-3) + 251 f (4, wi_a)], 


where i = 4,5,...,N — 1. The local truncation error is t;+;(h) = By (ui)h, for some 


Hi © (ti-4, ti41). 
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Example 4 


Initial-Value Problems for Ordinary Differential Equations 


Adams-Moulton Implicit Methods 


Implicit methods are derived by using (#41, f (ti41, y(ti41))) as an additional interpolation 
node in the approximation of the integral 


t41 
f(t, y@) dt. 


qj 


Some of the more common implicit methods are as follows. 


Adams-Moulton Two-Step Implicit Method 
Wo=a, Wi =a, 
h 
Wit) = Wit Pfr wit) +8 fC, wi) — fG-1, wi-vI, (5.37) 


where i = 1,2,...,N — 1. The local truncation error is 14, (h) = —yy% (1;)h?, for some 
Hi © (Gi-1, tit). 


Adams-Moulton Three-Step Implicit Method 


Wo=a, Wr =A, W2=a, (5.38) 


h 
Wit) = Wit yg OF tt, wisi) + 19 f(t, wi) — Sf (G1, wi-1) + f (Gi-2, wi-2)], 


where i = 2,3,...,N—1. The local truncation error is t+; (h) = — By (win, for some 
Hi © (ti-2, ti+1)- 
Adams-Moulton Four-Step Implicit Method 
Wwo=a, Wi =A, W2=—42, W3= 3, 
Wi4t = Wi + "1951 ftiat, Wi+1) + 646 f (t;, wi) (5.39) 


720 
— 264 f (ti-1, Wi-1) + 106 f (t;_-2, wi-2) — 19 f (ti-3, wi-3)], 


where i = 3,4,...,N —1. The local truncation error is t4)(1) = — py” (2;)h>, for some 
Mi € (ti-3, tis1). 

It is interesting to compare an m-step Adams-Bashforth explicit method with an (m— 1)- 
step Adams-Moulton implicit method. Both involve m evaluations of f per step, and both 
have the terms y""+) (w;)h in their local truncation errors. In general, the coefficients of 
the terms involving f in the local truncation error are smaller for the implicit methods than 
for the explicit methods. This leads to greater stability and smaller round-off errors for the 
implicit methods. 


Consider the initial-value problem 
y=y-Ptl, O<1<2, yO =05. 


Use the exact values given from y(t) = (¢ + 1)? — 0.5e! as starting values and h = 0.2 to 
compare the approximations from (a) by the explicit Adams-Bashforth four-step method 
and (b) the implicit Adams-Moulton three-step method. 
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Solution (a) The Adams-Bashforth method has the difference equation 
h 
Wiel = Wi + ig oF Gi wi) — 59f(Gj-1, Wi-1) + 37 f (4-2, Wi-2) — Of G3, wi-s)I, 


for i = 3,4,...,9. When simplified using f(t,y) = y — ?+1,h = 0.2, and t; = 0.2i, it 
becomes 


1 
Wit = 74 B5wi — 11.8w;_; + 7.4wj_2 — 1.8w;_3 — 0.192i — 0.192: + 4.736]. 


(b) The Adams-Moulton method has the difference equation 


h 
Wit. = Wit xg OF Girt, wit1) + 19f Gj, wi) — Sf (i-1, wi-1) + f (i-2, wi-2)], 
for i = 2,3,...,9. This reduces to 
1 
Wigs = 5g [1 Swiss + 27.80; — w)-1 + 0.2w}-2 — 0.1927? — 0.192i + 4.736]. 


To use this method explicitly, we meed to solve the equation explicitly solve for w;+1. 

This gives 
1 
Wit1 = 59 9 27 Bw: — wi-1 + 0.2w;_2 — 0.192i? — 0.192 + 4.736], 

fori = 2,3,...,9. 

The results in Table 5.13 were obtained using the exact values from y(t) = (tf + 1)? — 
0.5e for w, a, @2, and @3 in the explicit Adams-Bashforth case and for a, a, and @ in 
the implicit Adams-Moulton case. Note that the implicit Adams-Moulton method gives 


consistently better results. a 
Table 5.13 Adams- Adams- 
Bashforth Moulton 
t Exact Wi Error Ww; Error 


0.0 0.5000000 
0.2 0.8292986 


0.4 1.2140877 

0.6 1.6489406 1.6489341 0.0000065 
0.8 2.1272295 2.1273124 0.0000828 2.1272136 0.0000160 
1.0 2.6408591 2.6410810 0.0002219 2.6408298 0.0000293 
1.2 3.1799415 3.1803480 0.0004065 3.1798937 0.0000478 
1.4 3.7324000 3.7330601 0.0006601 3.7323270 0.000073 1 
1.6 4.2834838 4.284493 1 0.0010093 4.2833767 0.0001071 


1.8 4.8151763 4.8166575 0.0014812 4.8150236 0.0001527 
2.0 5.3054720 5.3075838 0.0021119 5.3052587 0.0002 132 


Multistep methods are available as options of the Initial ValueProblem command, in a 
manner similar to that of the one step methods. The command for the Adam Bashforth Four 
Step method applied to our usual example has the form 


C := InitialValueProblem(deq, y(O) = 0.5, t = 2, method = adamsbashforth, 
submethod = step4,numsteps = 10, output = information, digits = 8) 


The output from this method is similar to the results in Table 5.13 except that the exact 
values were used in Table 5.13 and approximations were used as starting values for the 
Maple approximations. 

To apply the Adams-Mouton Three Step method to this problem, the options would be 
changed to method = adamsmoulton, submethod = step3. 
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Initial-Value Problems for Ordinary Differential Equations 


Predictor-Corrector Methods 


In Example 4 the implicit Adams-Moulton method gave better results than the explicit 
Adams-Bashforth method of the same order. Although this is generally the case, the implicit 
methods have the inherent weakness of first having to convert the method algebraically to 
an explicit representation for w;,,. This procedure is not always possible, as can be seen 
by considering the elementary initial-value problem 


y=e, 0<1r<0.25, yO)=1. 


Because f(t, y) = e’, the three-step Adams-Moulton method has 
Wit. = Wi + oa le Pl be 19e"! = Ser) pers] 


as its difference equation, and this equation cannot be algebraically solved for w;+1. 

We could use Newton’s method or the secant method to approximate w;+,, but this 
complicates the procedure considerably. In practice, implicit multistep methods are not used 
as described above. Rather, they are used to improve approximations obtained by explicit 
methods. The combination of an explicit method to predict and an implicit to improve the 
prediction is called a predictor-corrector method. 

Consider the following fourth-order method for solving an initial-value problem. The 
first step is to calculate the starting values wo, w;, W2, and w3 for the four-step explicit 
Adams-Bashforth method. To do this, we use a fourth-order one-step method, the Runge- 
Kutta method of order four. The next step is to calculate an approximation, wap, to y(t4) 
using the explicit Adams-Bashforth method as predictor: 


h 
Wap = w3 + 7g PSF» w3) — 59 f (tr, w2) + 37 f(t, wi) — 9f (fo, wo)]. 


This approximation is improved by inserting wa, in the right side of the three-step implicit 
Adams-Moulton method and using that method as a corrector. This gives 


h 
wa = w3+ 3g OF Gs, Way) + 19 f (3, w3) — Sf (to, w2) + f(t, wi]. 


The only new function evaluation required in this procedure is f(t4, wap) in the corrector 
equation; all the other values of f have been calculated for earlier approximations. 

The value wz, is then used as the approximation to y(t4), and the technique of using the 
Adams-Bashforth method as a predictor and the Adams-Moulton method as a corrector is 
repeated to find ws, and ws, the initial and final approximations to y(ts). This process is 
continued until we obtain an approximation w, to y(ty) = y(b). 

Improved approximations to y(t;+1) might be obtained by iterating the Adams-Moulton 
formula, but these converge to the approximation given by the implicit formula rather than 
to the solution y(t;;;). Hence it is usually more efficient to use a reduction in the step size 
if improved accuracy is needed. 

Algorithm 5.4 is based on the fourth-order Adams-Bashforth method as predictor and 
one iteration of the Adams-Moulton method as corrector, with the starting values obtained 
from the fourth-order Runge-Kutta method. 
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Adams Fourth-Order Predictor-Corrector 
To approximate the solution of the initial-value problem 


y= fGy, astsadb, yaQ=a, 


at (N + 1) equally spaced numbers in the interval [a, b]: 


INPUT endpoints a, b; integer N; initial condition a. 
OUTPUT approximation w to y at the (NV + 1) values of tf. 


Step 1 Seth=(b—a)/N; 
to = a; 
Wo = a; 
OUTPUT (fo, wo). 
Step 2 Fori= 1,2,3, do Steps 3-5. 
(Compute starting values using Runge-Kutta method.) 
Step 3 Set K, =hf(t-1, wi_-1); 
Ko =hf(t-1 + h/2, wi-1 + Ki/2); 
K3 =hf (t-1 + h/2, wi-1 + K2/2); 
Ky =hf (ti, +h, wi-1 + K3). 
Step 4 Set w; = wi) + (Ki + 2K2 + 2K34+ K4)/6; 
t; =a+ih. 


Step 5 OUTPUT (t;, w)). 
Step 6 Fori=4,...,N do Steps 7-10. 
Step 7 Sett=a+ih; 
w= w3+ h[55 f(b, w3) — 59 f(t, w2) + 37 f (th, w1) 
—~9f (to, wo)l/24; (Predict wi.) 


w= w3+h[9f (t,w) + 19 f (5, w3) — 5f (to, wo) 
+ f(t, w1)]/24. (Correct wj.) 


Step 8 OUTPUT (t,w). 


Step 9 Forj=0,1,2 
set t; = tj41; (Prepare for next iteration.) 
Wj = VWj+1- 
Step 10 Sett; =1; 
W3 = W. 


Step 11 STOP. 2 


Example 5 Apply the Adams fourth-order predictor-corrector method with h = 0.2 and starting values 
from the Runge-Kutta fourth order method to the initial-value problem 
y=y-P+l, O<t<2, yO) =05. 


Solution This is continuation and modification of the problem considered in Example 1 
at the beginning of the section. In that example we found that the starting approximations 
from Runge-Kutta are 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


312 CHAPTER 5 «& Initial-Value Problems for Ordinary Differential Equations 


y(O) = wo = 0.5, y(0.2) © w; = 0.8292933, y(0.4) © w2 = 1.2140762, and 
y(0.6) © w3 = 1.6489220. 


and the fourth-order Adams-Bashforth method gave 


12 
y(0.8) © wap = w3 + (55 f (0.6, w3) — 59 f (0.4, w2) + 37 (0.2, wi) — 9f 0, wo)) 


0.2 
= 1.6489220 + ma Oo 1.6489220) — 59 f (0.4, 1.2140762) 


+ 37 f (0.2, 0.8292933) — 9 f (0, 0.5)) 

= 1.6489220 + 0.0083333(55(2.2889220) — 59(2.0540762) 
+ 37(1.7892933) — 9(1.5)) 

= 2.1272892. 


We will now use wa, as the predictor of the approximation to y(0.8) and determine the 
corrected value wy, from the implicit Adams-Moulton method. This gives 


iS 
y(0.8) © w4 = w3 + (9 f (0.8, wap) + 19 f (0.6, wz) — 5f 0.4, wo) + f 0.2, wi) 


0.2 
= 1.6489220 + a lO: 2.1272892) + 19 f (0.6, 1.6489220) 


— Sf (0.4, 1.2140762) + f (0.2, 0.8292933)) 

= 1.6489220 + 0.0083333(9(2.4872892) + 19(2.2889220) — 5(2.0540762) 
+ (1.7892933)) 

= 2.1272056. 


Now we use this approximation to determine the predictor, ws,, for y(1.0) as 


S 
y(1.0) © wsp = wa + oy (55 f (0.8, w4) — 59 f (0.6, w3) + 37 f (0.4, w2) — 9 f (0.2, w))) 


0.2 
=2.1272056 + 57 (55, (0.8, 2.1272056) — 59 f (0.6, 1.648920) 


+ 37 f (0.4, 1.2140762) — 9 f (0.2, 0.8292933)) 

= 2.1272056 +-0.0083333(55(2.4872056) —59(2.2889220) +37(2.0540762) 
— 9(1.7892933)) 

= 2.64093 14, 


and correct this with 


0.2 
y(1.0) © ws = wa + 57 (9F (1.0, wsp) + 19f (0.8, ws) — 5f (0.6, ws) + fO.4, w2)) 


0.2 
= 2.1272056 + 94 OFOO, 2.6409314) + 19 f (0.8, 2.1272892) 


— Sf (0.6, 1.6489220) + f (0.4, 1.2140762)) 
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Table 5.14 


Edward Arthur Milne 
(1896-1950) worked in ballistic 
research during World War I, and 
then for the Solar Physics 
Observatory at Cambridge. In 
1929 he was appointed the 

W. W. Rouse Ball chair at 
Wadham College in Oxford. 


Simpson’s name is associated 
with this technique because it is 
based on Simpson’s rule for 
integration. 
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= 2.1272056 + 0.0083333(9(2.6409314) + 19(2.4872056) — 5(2.2889220) 
+ (2.0540762)) 
= 2.6408286. 


In Example | we found that using the explicit Adams-Bashforth method alone produced 
results that were inferior to those of Runge-Kutta. However, these approximations to y(0.8) 
and y(1.0) are accurate to within 


|2.1272295 — 2.1272056| = 2.39 x 10-5 and |2.6408286 — 2.6408591| = 3.05 x 1075. 


respectively, compared to those of Runge-Kutta, which were accurate, respectively, to within 
|2.1272027 — 2.1272892| = 2.69 x 107° 


and |2.6408227 — 2.6408591| = 3.64 x 107°. 


The remaining predictor-corrector approximations were generated using Algorithm 5.4 and 


are shown in Table 5.14. a 
Error 

ti yi = y(t) Wi lyi — wil 
0.0 0.5000000 0.5000000 0 

0.2 0.8292986 0.8292933 0.0000053 
0.4 1.2140877 1.2140762 0.00001 14 
0.6 1.6489406 1.6489220 0.0000186 
0.8 2.1272295 2.1272056 0.0000239 
1.0 2.6408591 2.6408286 0.0000305 
1.2 3.1799415 3.1799026 0.0000389 
1.4 3.7324000 3.7323505 0.0000495 
1.6 4.2834838 4.2834208 0.0000630 
1.8 4.8151763 4.8150964 0.0000799 
2.0 5.3054720 5.3053707 0.0001013 


Adams Fourth Order Predictor-Corrector method is implemented in Maple for the 
example problem with 


C := InitialValueProblem(deq, y(O) = 0.5, t = 2, method = adamsbashforthmoulton, 
submethod = step4,numsteps = 10, output = information, digits = 8) 


and generates the same values as in Table 5.14. 

Other multistep methods can be derived using integration of interpolating polynomials 
over intervals of the form [#;, t;41], forj < i—1, to obtain an approximation to y(t;;1). When 
an interpolating polynomial is integrated over [t;_3, t:1], the result is the explicit Milne’s 
method: 


4h 
Wi41 = Wi-3+ 3 PIG: wi) — f ti-1, Wi-1) + 2f (4-2, Wi-2)I, 


which has local truncation error Bhty® (&;), for some &; € (t)-3, ti41). 
Milne’s method is occasionally used as a predictor for the implicit Simpson’s method, 
h 
Witt = Wi-1 + lf Gin wint) +46 (i, wi) + f i-1, Wi-1)], 


which has local truncation error — (h*/90)y© (&;), for some &; € (¢;—1, ti41), and is obtained 
by integrating an interpolating polynomial over [f;-1, fi41]. 
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The local truncation error involved with a predictor-corrector method of the Milne- 
Simpson type is generally smaller than that of the Adams-Bashforth-Moulton method. But 
the technique has limited use because of round-off error problems, which do not occur with 
the Adams procedure. Elaboration on this difficulty is given in Section 5.10. 


EXERCISE SET 5.6 


1. 


oe ve 


Use all the Adams-Bashforth methods to approximate the solutions to the following initial-value 


problems. In each case use exact starting values, and compare the results to the actual values. 
! 


a y=te’—2y, O<t<1, y(0)=0, withh =0.2; actual solution y(t) = tte* - xen + 
b y=14+(t—y)’, 2<1t<3, y(2)=1, withh =0.2; actual solution y(t) = t+ ae 
y=l+y/t, 1<t<2, yd) =2, withh =0.2; actual solution y(t) = tint +4 2r. 

y = cos2t+sin3r, O<t< 1, yO) = 1, with h = 0.2; actual solution y(t) = 


lo 1 4 
3 Sin 2t — 3 COS 3t + 3 


a9 


Use each of the Adams-Bashforth methods to approximate the solutions to the following initial-value 
problems. In each case use starting values obtained from the Runge-Kutta method of order four. 
Compare the results to the actual values. 


, 2-2ty : : 2t+1 
a yo », O<t<1, yO)=1, withh=0.1 actual solution y(t) = 


P+1 P42 
2 
y = : : -1 
bh. y= , Lares 2, 1) = —(n2)7!, with h = 0.1 actual solut t) = ———_.. 
y naa ae a y(1) dn 2) wi actual solution y(t) nG+ 1) 
2t 


ec y=6°+y)/t, 1<t<3, yl) =—2, withh = 0.2 actual solution y(t) = i 


d. y =-ty+4t/y, O<t<1, y(0)=1, withh=0.1 actual solution y(t) = V4 —3e-”. 
Use each of the Adams-Bashforth methods to approximate the solutions to the following initial-value 


problems. In each case use starting values obtained from the Runge-Kutta method of order four. 
Compare the results to the actual values. 


ay =y/t—G/t”, 1<t<2, y)=1, with =0.1; actual solution y(t) = , mae 
by =14+y/t+(/t?, 1<t<3, yd) =0, withh = 0.2; actual solution y(t) = ttan(Inf). 
e y=-(V+t)Do04+3) OK<t<2, yO) = —2, withh = 0.1; actual solution y(t) = 
—342/(1+e-%). 

d. yy =—Sy45?+2r, O<t<1, y(0O) = 1/3, withh = 0.1; actual solution y(t) = P4ie 
Use all the Adams-Moulton methods to approximate the solutions to the Exercises 1(a), 1(c), and 
1(d). In each case use exact starting values, and explicitly solve for w;;,;. Compare the results to the 
actual values. 


Use Algorithm 5.4 to approximate the solutions to the initial-value problems in Exercise 1. 
Use Algorithm 5.4 to approximate the solutions to the initial-value problems in Exercise 2. 
Use Algorithm 5.4 to approximate the solutions to the initial-value problems in Exercise 3. 


Change Algorithm 5.4 so that the corrector can be iterated for a given number p iterations. Repeat 
Exercise 7 with p = 2,3, and 4 iterations. Which choice of p gives the best answer for each initial-value 
problem? 


The initial-value problem 
y=e&, O0<r<0.20, yO)=1 
has solution 


y(t) = 1—In(1 — et). 
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Applying the three-step Adams-Moulton method to this problem is equivalent to finding the fixed 
point w+) of 


I 
g(w) = wi + 5 Oe" + 192M" — Se"! + et2). 


a. With h = 0.01, obtain w,,,; by functional iteration for i = 2,..., 19 using exact starting values 
Wo, W;, and w2. At each step use w; to initially approximate w;,. 
b. Will Newton’s method speed the convergence over functional iteration? 
10. Use the Milne-Simpson Predictor-Corrector method to approximate the solutions to the initial-value 
problems in Exercise 3. 
11. a. Derive the Adams-Bashforth Two-Step method by using the Lagrange form of the interpolating 
polynomial. 
b. Derive the Adams-Bashforth Four-Step method by using Newton’s backward-difference form 
of the interpolating polynomial. 
12. Derive the Adams-Bashforth Three-Step method by the following method. Set 


V(tint) = VG) + ahf Gi, yi) + DAF G1. ¥G-1)) + chf (G2, y\i-2)). 


Expand y(ti+1), f (ti-2. y(fi-2)), and f(ti-1, y(t-1)) in Taylor series about (¢;, y(t;)), and equate the 
coefficients of h, h? and h? to obtain a, b, and c. 

13. Derive the Adams-Moulton Two-Step method and its local truncation error by using an appropriate 
form of an interpolating polynomial. 

14. Derive Simpson’s method by applying Simpson’s rule to the integral 


rer ee | fly) dt. 


ti} 
15. Derive Milne’s method by applying the open Newton-Cotes formula (4.29) to the integral 


vt) —yt-s= | f(ty(O) at. 


T_3 


16. Verify the entries in Table 5.12 on page 305. 


| a 5.7 Variable Step-Size Multistep Methods 


The Runge-Kutta-Fehlberg method is used for error control because at each step it provides, 
at little additional cost, two approximations that can be compared and related to the local 
truncation error. Predictor-corrector techniques always generate two approximations at each 
step, so they are natural candidates for error-control adaptation. 

To demonstrate the error-control procedure, we construct a variable step-size predictor- 
corrector method using the four-step explicit Adams-Bashforth method as predictor and the 
three-step implicit Adams-Moulton method as corrector. 

The Adams-Bashforth four-step method comes from the relation 


h 
yin) = yi) + 9g Po FE VG) — 59 f (t-1, y(ti-1)) 


251 (6)/n \z5 
+ 37f (t-2, V(ti-2)) — 9 f (4-3, V(ti-3))] + 770) (hi, 


for some ft; € (t;-3, tj41). The assumption that the approximations wo, w1,..., w; are all 
exact implies that the Adams-Bashforth local truncation error is 
Vti41) — Wittp 251 


= On. 4 
i 799) (Hi) (5.40) 
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A similar analysis of the Adams-Moulton three-step method, which comes from 


h 
Vtg) = y(t) + 5g PF Cit YG) + 19f (4, y@)) — Sf (Gi-1, 9-1) 


19 - 
+ f(ti-2, (t-2))1 — a (ju;)h°, 


for some /1; € (f;-2, fi+1), leads to the local truncation error 


y(ti41) — Wi+1 19 (5)7.~ 4 
= jh’. 5.41 
h 770° (Hi) (5.41) 


To proceed further, we must make the assumption that for small values of h, we have 
y® (i) © y (i). 


The effectiveness of the error-control technique depends directly on this assumption. 
If we subtract Eq. (5.40) from Eq. (5.39), we have 


4 
Witt — Witip _ ft S)7 O75 314.) ¢ 
= 251 i) +19 x ih Ds 
; 790! yo? (ai) + 19y a;)] ge? (Hi) 
SO 
. 8 
y? (i) © 375 wit — Wi+1p)- (5.42) 


Using this result to eliminate the term involving y (ji;)h* from Eq. (5.41) gives the 
approximation to the Adams-Moulton local truncation error 


ly(ti41) — wii] 19h* 8 19|wig1 — Witipl 


i+1(A)| = os : i itlp| = 
Iti+1(A)| ; 770 ay wit! Wi+tpl 70h 


Suppose we now reconsider (Eq. 5.41) with a new step size gh generating new approx- 
imations #41, and w;+1. The object is to choose q so that the local truncation error given 
in Eq. (5.41) is bounded by a prescribed tolerance ¢. If we assume that the value y© (2) in 
Eq. (5.41) associated with gh is also approximated using Eq. (5.42), then 


Lyi + gh) — Bistl _ 19G"AE oy yy U9gtht | 8 | 
— y Wj Wj. 
gh ao 1. |\3ep° °° e 
_ 19q* |wia — witipl 
~ 270 h , 


and we need to choose q so that 


ly@i + qh) — Wisi] — 19q* [wit — witty 
gh 270 h 


That is, choose g so that 


(= he yo ( he yo 
ae a eel 
19 |wis1 — Wittpl [Witt — Wittpl 
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A number of approximation assumptions have been made in this development, so in 
practice g is chosen conservatively, often as 


he 1/4 
ane 3) 
|Wi41 _ Wittpl 


A change in step size for a multistep method is more costly in terms of function 
evaluations than for a one-step method, because new equally-spaced starting values must 
be computed. As a consequence, it is common practice to ignore the step-size change 
whenever the local truncation error is between ¢/10 and ¢, that is, when 


é lyG@in) — Wigil 19] wis1 — wit! 
— < |ti41(4)| = x <é. 


10 h ~ 270h 
In addition, g is given an upper bound to ensure that a single unusually accurate approx- 
imation does not result in too large a step size. Algorithm 5.5 incorporates this safeguard 
with an upper bound of 4. 
Remember that the multistep methods require equal step sizes for the starting values. 
So any change in step size necessitates recalculating new starting values at that point. In 
Steps 3, 16, and 19 of Algorithm 5.5 this is done by calling a Runge-Kutta subalgorithm 
(Algorithm 5.2), which has been set up in Step 1. 


Adams Variable Step-Size Predictor-Corrector 
To approximate the solution of the initial-value problem 


y=f(y), a<t<b, ya@=a 


with local truncation error within a given tolerance: 


INPUT endpoints a,b; initial condition a; tolerance TOL; maximum step size hmax; 
minimum step size hmin. 


OUTPUT i,t;, w;,h where at the ith step w; approximates y(t;) and the step size h was 
used, or a message that the minimum step size was exceeded. 


Step 1 Set up a subalgorithm for the Runge-Kutta fourth-order method to be called 
RK4(h, vo, Xo, V1, X1, V2, X2, V3,X3) that accepts as input a step size h and 
starting values vg © y(xo) and returns {(x;, vj) | j = 1, 2,3} defined by the 
following: 

for j = 1,2,3 
set Ky = hf (xj-1, vj-1); 
Ky = hf (j-1 +h/2, vj-1 + K,/2) 
K3 =hf Qj + h/2, vj-1 + Ko/2) 
Kg =hf (xj-1 +h, vj) + K3) 
vj = vj-1 + (Ky + 2K + 2K3 + K4)/6; 


xj = Xo + jh. 
Step 2 Sett =a; 
Wo = a, 
h = hmax; 


FLAG = 1; (FLAG will be used to exit the loop in Step 4.) 
LAST = 0; (LAST will indicate when the last value is calculated.) 
OUTPUT (fo, wo). 
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Step 3 Call RK4(h, wo, to, Wi, tH, W2, fo, W3, t3); 
Set NFLAG = 1; Undicates computation from RK4.) 
i=4; 
t=t+h. 


Step 4 While (FLAG = 1) do Steps 5—20. 
Step 5 Set WP =w;,-\ + S155 h 1, wi-1) — 59 f (ti-2, Wi-2) 
+ 37f (tj-3, Wi-3) — 9 f (ti-4, Wi-4)]; (Predict wi.) 
WC = wj-1 + * prt WP) + 19f(G-1, wi-1) 
— Sf (ti-2, Wi-2) + f(ti-3, wi-3)]; (Correct wi.) 
o = 19|WC — WP|/(270h). 


Step 6 Ifo < TOL then do Steps 7-16 (Result accepted.) 
else do Steps 17-19. (Result rejected.) 


Step 7 Set w;= WC; (Result accepted.) 


t=t. 
Step 8 If NFLAG = | then forj =i-—3,i—2,i-1,i 
OUTPUT (, tj, wj, ); 
(Previous results also accepted.) 
else OUTPUT (i, t;, w;, h). 


(Previous results already accepted.) 


Step 9 If LAST= 1 then set FLAG =0 (Next step is 20.) 
else do Steps 10-16. 


Step 10 Seti=i+1; 
NFLAG = 0. 


Step 11. Ifo < 0.1 TOLort;_; +h > b then do Steps 12-16. 
(Increase h if it is more accurate than required or decrease 
h to include b as a mesh point.) 
Step 12. Set gq = (TOL/(20))"4. 
Step 13 Ifq > 4 then set h = 4h 
else set h = gh. 
Step 14 Ifh > hmax then set h = hmax. 


Step 15 Ift,_; +4h> bthen 
set hh = (b — t;_)/4; 
LAST = 1. 
Step 16 Call RK4(A, w)_1, ti-1, Wi, ti, Wis, tis, Wi42, ti42); 
Set NFLAG = 1; 
i=i+3. (True branch completed. Next step is 20.) 


Step 17 Set g = (TOL/(2c))'/4. (False branch from Step 6: Result rejected.) 


Step 18 Ifq < 0.1 then seth =0.1h 
else set h = qh. 
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Step 19 Ifh < hmin then set FLAG = 0; 
OUTPUT (‘hmin exceeded’) 
else 
if NFLAG = | then set i = i — 3; 
(Previous results also rejected.) 
Call RK4(h, wi 1, t-1, Wi, ti, Wit, Fit1, Wi42, bi42)5 
seti=i+3; 
NFLAG = 1. 


Step 20 Sett=t-, +h. 
Step 21. STOP. 2 


Example 1 Use Adams variable step-size predictor-corrector method with maximum step size hmax = 
0.2, minimum step size hmin = 0.01, and tolerance TOL = 10~> to approximate the 
solution of the initial-value problem 


y=y-P+l, O<t<2, yO)=05. 


Solution We begin with h = hmax = 0.2, and obtain wo, wi, w2 and w3 using Runge- 
Kutta, then find wp4 and wc, by applying the predictor-corrector method. These calculations 
were done in Example 5 of Section 5.6 where it was determined that the Runge-Kutta 
approximations are 


y(0) = wo = 0.5, y(0.2) © w, = 0.8292933, y(0.4) © wz = 1.2140762, and 
y(0.6) © w3 = 1.6489220. 
The predictor and corrector gave 
y(0) = wo = 0.5, y(0.2) © w, = 0.8292933, y(0.4) © wz = 1.2140762, and 
y(0.6) © w3 = 1.6489220. 


0.2 
OS) Wig tig ec, OPT WO Bt) <— 291 0 a) OTT 10 2st) 27 0,0) 
= 2.1272892, 


and 


o 
y(0.8) © wa = w3 + a (9 f 0.8, wap) + 19 f (0.6, ws) — 5f (0.42, w2) + f (0.2, w1)) 


= 2.1272056. 


We now need to determine if these approximations are sufficiently accurate or if there needs 
to be a change in the step size. First we find 


19 19 
5 = ——|w4 — w4)| = ————|2.1272056 — 2.1272892| = 2.941 x 107%. 
270h 270(0.2) 


Because this exceeds the tolerance of 10~> a new step size is needed and the new step size is 


10-5 1/4 10-5 1/4 
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As aconsequence, we need to begin the procedure again computing the Runge-Kutta values 
with this step size, and then use the predictor-corrector method with this same step size to 
compute the new values of w4, and w4. We then need to run the accuracy check on these 
approximations to see that we have been successful. Table 5.15 shows that this second run 


is successful and lists the all results obtained using Algorithm 5.5. a 
tj y(t) Wi h; 0; IyG) — wil 
0 0.5 0.5 
0.1257017 0.7002323 0.7002318 0.1257017 4.051 x 10~° 0.0000005 
0.2514033 0.9230960 0.9230949 0.1257017 4.051 x 10~° 0.000001 1 
0.3771050 1.1673894 1.1673877 0.1257017 4.051 x 10~° 0.0000017 
0.5028066 1.4317502 1.4317480 0.1257017 4.051 x 10~° 0.0000022 
0.6285083 1.7146334 1.7146306 0.1257017 4.610 x 10° 0.0000028 
0.7542100 2.0142869 2.0142834 0.1257017 5.210 x 10~° 0.0000035 
0.8799116 2.3287244 2.3287200 0.1257017 5.913 x 10-6 0.0000043 
1.0056133 2.6556930 2.6556877 0.1257017 6.706 x 1076 0.0000054 
1.1313149 2.9926385 2.99263 19 0.1257017 7.604 x 1076 0.0000066 
1.2570166 3.3366642 3.3366562 0.1257017 8.622 x 10-6 0.0000080 
1.3827183 3.6844857 3.6844761 0.1257017 9.777 x 1078 0.0000097 
1.4857283 3.9697541 3.9697433 0.1030100 7.029 x 1076 0.0000108 
1.5887383 4.2527830 4.2527711 0.1030100 7.029 x 10-6 0.0000120 
1.6917483 4.5310269 4.5310137 0.1030100 7.029 x 1076 0.0000133 
1.7947583 4.8016639 4.8016488 0.1030100 7.029 x 1076 0.0000151 
1.8977683 5.0615660 5.0615488 0.1030100 7.760 x 10-6 0.0000172 
1.9233262 5.1239941 5.1239764 0.0255579 3.918 x 1078 0.0000177 
1.9488841 5.1854932 5.1854751 0.0255579 3.918 x 1078 0.0000181 
1.9744421 5.2460056 5.2459870 0.0255579 3.918 x 10-8 0.0000186 
2.0000000 5.3054720 5.3054529 0.0255579 3.918 x 10-8 0.0000191 


EXERCISE SET 5.7 


1. 


Use the Adams Variable Step-Size Predictor-Corrector Algorithm with tolerance TOL = 10~+, 
hmax = 0.25, and hmin = 0.025 to approximate the solutions to the given initial-value problems. 
Compare the results to the actual values. 


a y=te¥—2y, O<t<1, y@O)=0; actual solution y(t) = tre* — £e%+ te. 

bh y=1+(¢- y), 2<t<3, y(2)=1; actual solution y(t) =f+1/U —2). 

ee y=l1+y/t, 1<t<2, y(l)=2; actual solution y(t) = tlnt 4 2t. 

d. y’ =cos2t+sin3t, O<t<1, y(O)=1; actual solution y(t) = 5 sin 2t — t cos 3t+ z. 
Use the Adams Variable Step-Size Predictor-Corrector Algorithm with TOL = 10~* to approximate 
the solutions to the following initial-value problems: 

a y =(y/t +y/t, 1<t<1.2, yl) =1, with hmax = 0.05 and hmin = 0.01. 

b y=sinnt+e‘, O<t<1, yO) =0, with hmax = 0.2 and hmin = 0.01. 

c« y=O°%+y)/t, 1<t<3, yd) =—2, with hmax = 0.4 and hmin = 0.01. 

da. y=-ty+4t/y, O<t<1, y(O)=1, with Amax = 0.2 and hmin = 0.01. 

Use the Adams Variable Step-Size Predictor-Corrector Algorithm with tolerance TOL = 107°, 


hmax = 0.5, and hmin = 0.02 to approximate the solutions to the given initial-value problems. 
Compare the results to the actual values. 


a y'=y/t—(/t?, 1<t<4, yl) =1; actual solution y(t) = ¢/(14+ Ind). 
b y=1+y/t+ (y/t)’, 1<t<3, y(1) =0; actual solution y(t) = ¢tan(Inf). 
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« y=-(W4+1)0+3), O<t<3, y(0O) =—2; actual solution y(t) = —3 +2(14+ e771. 

d. y=(t4+2A)y—-—t, O<t<2, yO)= 33 actual solution y(t) = (3 + 2 + 6e" 71/2, 
4. Construct an Adams Variable Step-Size Predictor-Corrector Algorithm based on the Adams-Bashforth 

five-step method and the Adams-Moulton four-step method. Repeat Exercise 3 using this new method. 


5. An electrical circuit consists of a capacitor of constant capacitance C = 1.1 farads in series with a 
resistor of constant resistance Rp = 2.1 ohms. A voltage E(t) = 110sint is applied at time tf = 0. 
When the resistor heats up, the resistance becomes a function of the current i, 


R(t) = Ro +ki, where k = 0.9, 


and the differential equation for i(t) becomes 


( =) di 1 1 dé 
1+ —i + 


Ro ) dt RoC. RoC at 
Find i(2), assuming that (0) = 0. 


| a 5.8 Extrapolation Methods 


Extrapolation was used in Section 4.5 for the approximation of definite integrals, where we 
found that by correctly averaging relatively inaccurate trapezoidal approximations exceed- 
ingly accurate new approximations were produced. In this section we will apply extrapo- 
lation to increase the accuracy of approximations to the solution of initial-value problems. 
As we have previously seen, the original approximations must have an error expansion of 
a specific form for the procedure to be successful. 

To apply extrapolation to solve initial-value problems, we use a technique based on the 
Midpoint method: 


Wi41 = Wi-1 + 2hf (t;, Wj), fori = 1. (5.43) 


This technique requires two starting values since both wo and w, are needed before the first 
midpoint approximation, w2, can be determined. One starting value is the initial condition 
for wo = y(a) = a. To determine the second starting value, w;, we apply Euler’s method. 
Subsequent approximations are obtained from (5.43). After a series of approximations of 
this type are generated ending at a value t, an endpoint correction is performed that involves 
the final two midpoint approximations. This produces an approximation w(t, 2) to y(t) that 
has the form 


y(t) = wlt,h) + > dh", (5.44) 
k=1 


where the 6; are constants related to the derivatives of the solution y(t). The important point 
is that the 6, do not depend on the step size h. The details of this procedure can be found in 
the paper by Gragg [Gr]. 

To illustrate the extrapolation technique for solving 


¥H=Hf/GYy> @eteb, say=e, 


assume that we have a fixed step size h. We wish to approximate y(t,) = y(a+ A). 
For the first extrapolation step we let ho = h/2 and use Euler’s method with wo = a 
to approximate y(a + ho) = y(a+h/2) as 


w, = wo + hof (a, wo). 
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We then apply the Midpoint method with #;_; = a and t; = a+ ho = a+h/2 to produce a 
first approximation to y(a + h) = y(a+ 2hpo), 


w2 = wo + 2ho f(a + ho, w1). 
The endpoint correction is applied to obtain the final approximation to y(a + h) for the step 


size ho. This results in the Ovhe) approximation to y(t) 


1 
Yiu = zlwe + wi +hof (a+ 2ho, w2)]. 


We save the approximation y,; and discard the intermediate results w; and w2. 
To obtain the next approximation, y2,1, to y(t,), we let hy = h/4 and use Euler’s method 
with wo = a to obtain an approximation to y(a+ h,) = y(a+ h/4) which we will call w;: 


wy = woth f(a, wo). 


Next we approximate y(a + 2h) = y(a+h/2) with wo, y(a+ 3h1) = y(a+ 3h/4) 
with w3, and w4 to y(a+ 4h;) = y(t) using the Midpoint method. 


w2 = Wo + 2h) f(a+ hy, wy), 
w3 = wy t+ 2h f (at 2h, w2), 
W4 = W2 + 2h; f (a+ 3h), w3). 


The endpoint correction is now applied to w3 and w4 to produce the improved Oh?) 
approximation to y(t), 


1 
y21 = awa +w3 +h f(a+ 4h, wa)]. 


Because of the form of the error given in (5.44), the two approximations to y(a + h) 
have the property that 


h\? h\* ia ht 
h) = ea ee bol oe Be Se he 
ya+th=yiit+ (5) + »(3) + Yii+ 1a + a : 


and 


h\? h\* I ht 
h)= 5 { — by { - ce pe Be eee ee 
y(at+h) =yoi+ (3) + (7) + sd Tas Ta 


We can eliminate the O(h’) portion of this truncation error by averaging the two formulas 
appropriately. Specifically, if we subtract the first formula from 4 times the second and 
divide the result by 3, we have 

ht 


1 
ylath)=yo2i+ 3 (ya1 — Yi) 264 + 


So the approximation to y(t,) given by 


1 
y2,2 = yar + 3021 — yi) 


has error of order O(h*). 
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We next let hy = h/6 and apply Euler’s method once followed by the Midpoint method 
five times. Then we use the endpoint correction to determine the h* approximation, y3,, to 
y(a +h) = y(t,). This approximation can be averaged with y2,, to produce a second O(h*) 
approximation that we denote y3. Then y3,. and yz are averaged to eliminate the O(h*) 
error terms and produce an approximation with error of order O(h°). Higher-order formulas 
are generated by continuing the process. 

The only significant difference between the extrapolation performed here and that 
used for Romberg integration in Section 4.5 results from the way the subdivisions are 
chosen. In Romberg integration there is a convenient formula for representing the Composite 
Trapezoidal rule approximations that uses consecutive divisions of the step size by the 
integers 1, 2,4, 8, 16, 32,64, ... This procedure permits the averaging process to proceed in 
an easily followed manner. 

We do not have a means for easily producing refined approximations for initial-value 
problems, so the divisions for the extrapolation technique are chosen to minimize the num- 
ber of required function evaluations. The averaging procedure arising from this choice of 
subdivision, shown in Table 5.16, is not as elementary, but, other than that, the process is 
the same as that used for Romberg integration. 


Table 5.16 


Yi = wit, ho) 
h2 
Yor = w(t, hy) Y22 = Yar + 5 (21 —yi) 
hy — hy 
hy hy 
y31 = w(t, ho) ¥32 = 31 + Hz 3.1 — Y2,1) 3,3 = 32 + 75 03,2 — Y2,2) 
hy — hy Ig — hy 
Algorithm 5.6 uses nodes of the Algorithm 5.6 uses the extrapolation technique with the sequence of integers 
form 2” and 2” - 3. Other choices 
can be used. go = 2, 1 =4, 2 = 6, G3 = 8, qa = 12, G5 = 16, go = 24, and q7 = 32. 


A basic step size h is selected, and the method progresses by using h; = h/q;, for each i = 
0,...,7, to approximate y(t+-/). The error is controlled by requiring that the approximations 
Y11,922,--. be computed until | yi; — y;-1,;-1| is less than a given tolerance. If the tolerance 
is not achieved by i = 8, then h is reduced, and the process is reapplied. 

Minimum and maximum values of h, hmin, and hmax, respectively, are specified to 
ensure control of the method. If y;; is found to be acceptable, then w, is set to y;; and 
computations begin again to determine w2, which will approximate y(t) = y(a+ 2h). The 
process is repeated until the approximation wy to y(b) is determined. 


Extrapolation 
To approximate the solution of the initial-value problem 
y=f(y), ast<b, y@=a, 
with local truncation error within a given tolerance: 
INPUT endpoints a, b; initial condition a; tolerance TOL; maximum step size hmax; 
minimum step size hmin. 


OUTPUT 7,W,h where W approximates y(t) and step size / was used, or a message 
that minimum step size was exceeded. 
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Step 1 Initialize the array NK = (2,4, 6, 8, 12, 16, 24, 32). 


Step 2 SetTO=a; 
WO =a; 
h = hmax; 
FLAG =1. (FLAG is used to exit the loop in Step 4.) 


Step 3 Fori=1,2,...,7 
forj=1,...,i 
set Qi; = (NKi+1/NKj)°. (Note: Qij = hi /h}, 1.) 


i+] 
Step 4 While (FLAG = 1) do Steps 5-20. 


Step 5 Setk=1; 
NFLAG = 0. (When desired accuracy is achieved, NFLAG is 
set to 1.) 


Step 6 While (k < 8 and NFLAG = 0) do Steps 7-14. 


Step 7 Set HK =h/NK;,; 
T=TO; 
W2 = WO; 
W3 =W2+4HK. f(T,W2); (Euler’s first step.) 
T =TO+ HK. 


Step 8 Forj=1,...,NK,-—1 
set W1 = W2; 
W2= W3; 
W3 = W1+2HK - f(T,W2); (Midpoint method.) 
T =TO+(j+1)-HK. 


Step 9 Sety, =(W3+W2+ HK - f(T, W3)]/2. 
(Endpoint correction to compute yx.) 


Step 10 If k > 2 then do Steps 11-13. 


(Note: yr—1 = Yk—1,15 Yk-2 = Vk—-225 +++ Y1 = Yk-1,k-1 Since only 
the previous row of the table is saved.) 


Step 11 Setj =k; 
v=yy. (Save ye-14-1-) 
Step 12 While (j > 2) do 
Jj — Yj-1 
Pi 
Or-1j-1 — 1 
(Extrapolation to compute yj—1 = Yk,k—j+2-) 


he_ yj — hy 
hy 3 hy, 


set yj-1 = Yj 


j=j-l. 
Step 13 If |y, — v| < TOL then set NFLAG = 1. 
(1 is accepted as the new w.) 


Step 14 Setk=k+1. 
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Step 15 Setk=k—1. 


Step 16 If NFLAG = 0 then do Steps 17 and 18 (Result rejected.) 
else do Steps 19 and 20. (Result accepted.) 


Step 17 Seth=h/2. (New value for w rejected, decrease h.) 


Step 18 Ifh < hmin then 
OUTPUT (‘hmin exceeded’); 
Set FLAG = 0. 
(True branch completed, next step is back to Step 4.) 


Step 19 SetWO=y,;_ (New value for w accepted.) 
TO=TO+h; 

OUTPUT (TO, WO, h). 

Step 20 If TO > b then set FLAG = 0 
(Procedure completed successfully.) 
else if TO+h > b then seth = b— TO 
(Terminate at t = b.) 
else if (k < 3 andh < 0.5(hmax) then set h = 2h. 
Uncrease step size if possible.) 


Step 21 STOP. 7 


Example 1 Use the extrapolation method with maximum step size hmax = 0.2, minimum step size 
hmin = 0.01, and tolerance TOL = 10~° to approximate the solution of the initial-value 
problem 


y=y-P+l, O0<t<2, y0)=05. 


Solution For the first step of the extrapolation method we let wy = 0.5, fp = Oandh = 0.2. 
Then we compute 


ho = h/2 = 0.1; 
w) = wo + ho f (to, Wo) = 0.5 +. 0.1(1.5) = 0.65; 
w2 = wo + 2ho f (to + ho, w1) = 0.5 + 0.2(1.64) = 0.828; 


II 


and the first approximation to y(0.2) is 
1 1 
y= 3 (wa + wi tho f (to + 2h, w2)) = 5 (0.828 + 0.65 + 0.1 f (0.2, 0.828)) = 0.8284. 


For the second approximation to y(0.2) we compute 
hy = h/4 = 0.05; 
wi = woth f (to, wo) = 0.5 + 0.05(1.5) = 0.575; 
W2 = Wo + 2h f (to + A, wi) = 0.5 + 0.101.5725) = 0.65725; 
w3 = wy + 2h f (to + 241, wr) = 0.575 + 0.1(1.64725) = 0.739725; 
W4 = W2 + 2hy f (to + 341, wW3) = 0.65725 + 0.1(1.717225) = 0.8289725. 
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Then the endpoint correction approximation is 


1 
yar = 3 (wa + w3 + hy f (to + 4h, w4)) 


1 
= pe aeaTes + 0.739725 + 0.05 f (0.2, 0.8289725)) = 0.8290730625. 
This gives the first extrapolation approximation 
= ya + Cie (y ) = 0.8292974167 
y22 = y21 (1/2)? — (1/42 21 — Yu) = VU. : 


The third approximation is found by computing 
hy = h/6 = 0.03; 
wy = wo t+ ho f (to, Wo) = 0.55; 
wW2 = wo + 2h2 f (to + Ao, Wi) = 0.6032592593; 
w3 = wi + 2ho f (to + 2h2, w2) = 0.6565876543; 
W4 = w2 + 2ho f (to + 3h2, w3) = 0.7130317696; 
Ws = w3 + 2ho f (to + 4ho, w4) = 0.7696045871; 
We = w4 + 2h2 f (to + Sho, w4) = 0.8291535569; 


then the end-point correction approximation 


1 
Yar = 5 (we + ws + hr f (to + 6h2, we) = 0.8291982979. 


We can now find two extrapolated approximations, 


(1/6) 
y32 = y31 + Ge: = a) (y31 — Ya) = 0.8292984862, 


and 


(1/6)? 
y33 = y32 + (1/2? — d/o? (32 — ya2) = 0.8292986199. 


Because 
|y33 — ya] = 1.2 x 10-° 


does not satisfy the tolerance, we need to compute at least one more row of the extrapo- 
lation table. We use h3 = h/8 = 0.025 and calculate w; by Euler’s method, w2,--- , wg 
by the moidpoint method and apply the endpoint correction. This will give us the new 
approximation y4; which permits us to compute the new extrapolation row 


ya, = 0.8292421745 yao = 0.8292985873 43 = 0.8292986210 y4q = 0.8292986211 


Comparing | y44—y33| = 1.2 x 10~° we find that the accuracy tolerance has not been reached. 
To obtain the entries in the next row, we use hy = h/12 = 0.06. First calculate w, by Euler’s 
method, then w2 through w 12 by the Midpoint method. Finally use the endpoint correction 
to obtain y5;. The remaining entries in the fifth row are obtained using extrapolation, and are 
shown in Table 5.17. Because ys5 = 0.8292986213 is within 10~° of y44 it is accepted as the 
approximation to y(0.2). The procedure begins anew to approximate y(0.4). The complete 
set of approximations accurate to the places listed is given in Table 5.18. a 
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Yi, = 0.8284000000 
Y21 = 0.8290730625 
Y3,1 = 0.8291982979 
ya. = 0.8292421745 
Ys = 0.8292735291 


Y2.2 = 0.8292974167 


Y3,2 = 0.8292984862 
y42 = 0.8292985873 
Ys = 0.8292986128 


y33 = 0.8292986199 
y43 = 0.8292986210 
y5,3 = 0.8292986213 


y44 = 0.8292986211 
ys,4 = 0.8292986213 


¥5,5 = 0.8292986213 


Table 5.18 


ti yi = yh) Wi h; k 
0.200 0.8292986210 0.8292986213 0.200 5 
0.400 1.2140876512 1.2140876510 0.200 4 
0.600 1.6489405998 1.6489406000 0.200 4 
0.700 1.8831236462 1.883 1236460 0.100 5 
0.800 2.1272295358 2.1272295360 0.100 4 
0.900 2.3801984444 2.3801984450 0.100 7 
0.925 2.4446908698 2.4446908710 0.025 8 
0.950 2.509645 1704 2.509645 1700 0.025 3 
1.000 2.6408590858 2.6408590860 0.050 3 
1.100 2.9079 169880 2.9079 169880 0.100 7 
1.200 3.1799415386 3.1799415380 0.100 6 
1.300 3.45535 16662 3.4553516610 0.100 8 
1.400 3.7324000166 3.7324000100 0.100 5 
1.450 3.8709427424 3.8709427340 0.050 7 
1.475 3.9401071136 3.9401071050 0.025 3 
1.525 4.0780532154 4.0780532060 0.050 4 
1.575 4.2152541820 4.2152541820 0.050 3 
1.675 4.4862274254 4.4862274160 0.100 4 
1.775 4.7504844318 4.7504844210 0.100 4 
1.825 4.8792274904 4.8792274790 0.050 3 
1.875 5.0052154398 5.0052154290 0.050 3 
1.925 5.1280506670 5.1280506570 0.050 4 
1.975 5.2473151731 5.2473151660 0.050 8 
2.000 5.3054719506 5.3054719440 0.025 3 


The proof that the method presented in Algorithm 5.6 converges involves results from 
summability theory; it can be found in the original paper of Gragg [Gr]. A number of other 
extrapolation procedures are available, some of which use the variable step-size techniques. 
For additional procedures based on the extrapolation process, see the Bulirsch and Stoer 
papers [BS1], [BS2], [BS3] or the text by Stetter [Stet]. The methods used by Bulirsch and 
Stoer involve interpolation with rational functions instead of the polynomial interpolation 
used in the Gragg procedure. 


EXERCISE SET 58 


1. Use the Extrapolation Algorithm with tolerance TOL = 10-+, max = 0.25, and hmin = 0.05 to 
approximate the solutions to the following initial-value problems. Compare the results to the actual 
values. 

a y'=te’—2y, OK<t 
b y=1l4+(t-yy, 2 


1, (0) = 0; actual solution y(t) = ite™ - xe + xe, 


t<3, y(2) =1; actual solution y(¢) =t+1/(1— 2). 


= 
= 
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«e yw=I1+y/t, 1<t<2, y(1) =2; actual solution y(t) = tint + 2t. 
d. y’ =cos2t+sin3t, O<t<1, y(0)=1; actual solution y(t) = $ sin 2t — + cos 3t+ 4. 
2. Use the Extrapolation Algorithm with TOL = 10~* to approximate the solutions to the following 
initial-value problems: 
a y =(y/t) +y/t, 1<t<1.2, y(1) =1, with Amax = 0.05 and Amin = 0.02. 
b y=sint+e‘, O<t<1, y(O)=0, with max = 0.25 and hmin = 0.02. 
ce y=O*4+y)/t, 1<t<3, yd) =—2, with hmax = 0.5 and hmin = 0.02. 
da. y=-ty+4t/y, O<t<1, y(O)=1, with hmax = 0.25 and hmin = 0.02. 
3. Use the Extrapolation Algorithm with tolerance TOL = 10~°, hmax = 0.5, and hmin = 0.05 to 
approximate the solutions to the following initial-value problems. Compare the results to the actual 
values. 


a y'=y/t—(y/t?, 1<t<4, yl) =1; actual solution y(t) = ¢/(14+ Ind). 

b. y =1+4+y/t+Q/0?, 1<t<3, y(1) =0; actual solution y(t) = ftan(Inf). 

ce y=-(W+)lHo0+3), O<t<3, yO) = —2; actual solution y(t) = —3+2(14+ e*)71. 

d. y =(t+2A)y—ty, O<t<2, yO) = 4; actual solution y(t) = (3 + 27° + 6e" 1/2, 
4. Let P(t) be the number of individuals in a population at time ¢, measured in years. If the average birth 


rate b is constant and the average death rate d is proportional to the size of the population (due to 
overcrowding), then the growth rate of the population is given by the logistic equation 


dP(t 
calc bP(t) — KIP@P, 
dt 
where d = kP(t). Suppose P(0) = 50,976, b = 2.9 x 10~?, and k = 1.4 x 1077. Find the population 
after 5 years. 


5.9 Higher-Order Equations and Systems of Differential Equations 


This section contains an introduction to the numerical solution of higher-order initial-value 
problems. The techniques discussed are limited to those that transform a higher-order equa- 
tion into a system of first-order differential equations. Before discussing the transformation 
procedure, some remarks are needed concerning systems that involve first-order differential 


equations. 
An mth-order system of first-order initial-value problems has the form 

du; 

a t,Uj,U2,...,Um), 
i fi(t, u1, U2 Um) 

Oe = pe ) 

ae ’ »U2,..-.,Um), 
dt 2U, U1, U2 

Gin = fale ) (5.45) 

= JmU, U1, U2,..-,Um), ‘ 

dt aoe 


for a < t < b, with the initial conditions 
Uj (a) = 1, U2 (a) = 2, ..., Un(A) = On. (5.46) 


The object is to find m functions u(t), u2(t),..., Um(t) that satisfy each of the differential 
equations together with all the initial conditions. 

To discuss existence and uniqueness of solutions to systems of equations, we need to 
extend the definition of the Lipschitz condition to functions of several variables. 
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Definition 5.16 The function f(t, y1,...,¥m), defined on the set 
D= {(t,uW1,...,Um) |a<t< band —oco <u; < ow, foreachi=1,2,...,m} 


is said to satisfy a Lipschitz condition on D in the variables uw, u2,..., um» if a constant 
L > O exists with 


[FC ui,..-5tm) — $(EZ1---42m)| SL ly — Bl, (5.47) 
j=l 


for all (t, uj,...,Um) and (f, Z1,...,Zm) inD. | 


By using the Mean Value Theorem, it can be shown that if f and its first partial 
derivatives are continuous on D and if 


Of (t,U1,..-,Um) 
Ou; 


foreachi = 1,2,...,mandall (t,u),...,u,,) in D, then f satisfies a Lipschitz condition on 
D with Lipschitz constant L (see [BiR], p. 141). A basic existence and uniqueness theorem 
follows. Its proof can be found in [BiR], pp. 152-154. 


Theorem 5.17 Suppose that 
D= {(t, U1, U2,...,Um) |a<t< band —o <u; < o, foreachi= 1,2,...,m}, 
and let fj(t,u,,...,Um), for each i = 1,2,...,m, be continuous and satisfy a Lipschitz 


condition on D. The system of first-order differential equations (5.45), subject to the initial 
conditions (5.46), has a unique solution u;(t),...,Umn(t), fora < t < b. ao 


Methods to solve systems of first-order differential equations are generalizations of the 
methods for a single first-order equation presented earlier in this chapter. For example, the 
classical Runge-Kutta method of order four given by 


Wo = a, 


ki =hf (ti, wi), 


h 1 
ky =hf feat 5 Wiehe , 


k3 = at(t + ami + sh), 
kg =hf (tii, wi + ky), 
Wig = Wit (ki + 2kp + 2k3+k4), foreachi=0,1,...,N—1, 
used to solve the first-order initial-value problem 
y=f(ty), ast<b, ya =a, 


is generalized as follows. 
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Let an integer N > 0 be chosen and set h = (b — a)/N. Partition the interval [a, b] into 
N subintervals with the mesh points 


t=a+jh, foreachj =0,1,...,N. 


Use the notation w,, foreach j = 0,1,...,N andi = 1,2,...,m, to denote an approx- 
imation to u;(t;). That is, wj approximates the ith solution u;(t) of (5.45) at the jth mesh 
point ¢;. For the initial conditions, set (see Figure 5.6) 


W10 = 1, W29 = 2, ..., Wn = Am. (5.48) 


Figure 5.6 


U»(a) = On 


A fr wld 


Suppose that the values w1,;, w2,;,...,Wm,j have been computed. We obtain w +1, 
W2,jt1s+++>Wm,j+1 by first calculating 


kya = hf (tj, W1,j, W2,j. ++ +5 Wm,j)s for each i = 1,2,...,m; (5.49) 


h 1 1 1 
koi = hfi (: + 7 WL + ahs w2,j + 5 h12. ee Wj + shim) ; (5.50) 


for each i = 1,2,...,m; 


h 1 1 1 
k3j = hfi (: + 9° Wij + 2 2,1, W2,j7 + ahaa» see Wm, j + sham) > (5.51) 


for each i = 1,2,...,m; 
kai =hfi(t) +h, wij + ka, w2,j + k32,-.-, Wj + kam), (5.52) 


for each i = 1,2,...,m; and then 
1 
Wij = Wig t 6 Oki + 2ko; + 2k3; + ky), (5.53) 


for eachi = 1,2,...,m. Note that all the values k,,1, ki, ..., ki must be computed before 
any of the terms of the form kz; can be determined. In general, each k;,1, ky2,.. . , Kim must be 
computed before any of the expressions k),1,;. Algorithm 5.7 implements the Runge-Kutta 
fourth-order method for systems of initial-value problems. 
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Runge-Kutta Method for Systems of Differential Equations 
To approximate the solution of the mth-order system of first-order initial-value problems 
Uy = fi(t,U1,U2,...,Um), aA<t< b, with uj(a) = aj, 


for j = 1,2,...,mat (N + 1) equally spaced numbers in the interval [a, b]: 


INPUT endpoints a, b; number of equations m; integer N; initial conditions a, ..., Qn. 


OUTPUT approximations w; to u;(t) at the (V + 1) values of r. 
Step 1 Seth=(b—a)/N; 


t=a. 
Step 2 Forj =1,2,...,mset w; = qj. 
Step 3. OUTPUT (t,w1, w2,..., Wm): 
Step 4 Fori=1,2,...,N do steps 5-11. 


Step 5 Forj=1,2,...,mset 
kj = hfj(t, wi, w2, . 2.) Wm). 


Step 6 Forj=1,2,...,mset 

kog =A f(t + 2, wi + $kia, w2 + bki2,.-.,Wm + Skim): 
Step 7 Forj=1,2,...,mset 

kag =Af(t + 4, wi + bho, w2 + dko2,..-,Wm + Fkom)- 


Step 8 Forj=1,2,...,mset 
ky =h f(t + h, wi + kg, w2 + k32,...,Wm + kam). 


Step 9 Forj=1,2,...,mset 
w= w; + (ky yj + 2koj + 2k3j + k4j)/6. 


Step 10 Sett=a+ih. 
Step 11. OUTPUT (t, w1, w2,..., Wm). 
Step 12 STOP. t 


Kirchhoff’s Law states that the sum of all instantaneous voltage changes around a closed 
circuit is zero. This law implies that the current /(t) in a closed circuit containing a resistance 
of R ohms, a capacitance of C farads, an inductance of L henries, and a voltage source of 
E(t) volts satisfies the equation 


LI'(t) + RI(t) + aie dt = E(t). 


The currents J; (t) and J)(f) in the left and right loops, respectively, of the circuit shown in 
Figure 5.7 are the solutions to the system of equations 


2h (t) + 6) — bh) + 24) = 12, 


a / h(t) dt + 4h(t) + 6[h() — 1,()] = 0. 
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Figure 5.7 


If the switch in the circuit is closed at time t = 0, we have the initial conditions /;(0) = 0 
and 1,(0) = 0. Solve for J; (t) in the first equation, differentiate the second equation, and 
substitute for 7; (t) to get 


T= fidt,,b) = —4h + 3h +6, I,(0) = 0, 
= fot, bh) =0.6F, — 0.2 = —2.4h +1.6h+3.6, (0) =0. 


The exact solution to this system is 


I\(t) = —3.375e~7 + 1.875e°" + 1.5, 

be =—-235e% 49356." 
We will apply the Runge-Kutta method of order four to this system with h = 0.1. Since 
wio = (0) = 0 and w29 = 1(0) = 0, 


ki1 =hfi (to, Wio, W209) = 0.1 fi (0, 0,0) = 0.1 (—4(0) + 30) + 6) = 0.6, 
ki2 =hfo(o, wi, W290) = 0.1 f2(0, 0,0) = 0.1 (—2.4(0) + 1.6(0) + 3.6) = 0.36, 


1 1 1 
koi =hfi (1 + ahs wiot+ rue w20 + ha) = 0.1 f, (0.05, 0.3, 0.18) 


= 0.1 (—4(0.3) + 3(0.18) + 6) = 0.534, 


1 1 1 
kao = h fr (1 + a Wio+ ahs w2,.0 + sha) = 0.1 f2(0.05, 0.3, 0.18) 


= 0.1 (—2.4(0.3) + 1.6(0.18) + 3.6) = 0.3168. 
Generating the remaining entries in a similar manner produces 


k3,1 = (0.1) f (0.05, 0.267, 0.1584) = 0.54072, 

k3.9 = (0.1) f2(0.05, 0.267, 0.1584) = 0.321264, 

ky: = (0.1) f,(0.1, 0.54072, 0.321264) = 0.4800912, 
kyo = (0.1) f2(0.1, 0.54072, 0.321264) = 0.28162944. 
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As a consequence, 
(0.1) © wit = wio + (ki + 2k) + 2k31 + ka,1) 
=0+ : (0.6 + 2(0.534) + 2(0.54072) + 0.4800912) = 0.5382552 
and 


1 
I,(0.1) ~ W2,1 = W2,.0 + 5 (ue + 2k2.2 + 2k3,9 + ka2) = 0.3196263. 


The remaining entries in Table 5.19 are generated in a similar manner. 


Table5.19 , 


fj Wij Wj h(t) — wiy |h(t)) — wa; 
0.0 0) 0 0 0 


0.1 0.5382550 0.3196263 0.8285 x 10-5 0.5803 x 107° 
0.2 0.9684983 0.5687817 0.1514 x 10-4 0.9596 x 10° 


0.3 1.310717 0.7607328 0.1907 x 10~* 0.1216 x 10~* 
0.4 1.581263 0.9063208 0.2098 x 10-4 0.1311 x 10-4 
0.5 1.793505 1.014402 0.2193 x 10-* 0.1240 x 107+ 
Recall that Maple reserves the Maple’s NumericalAnalysis package does not currently approximate the solution to 


ite Teserenem systems of initial value problems, but systems of first-order differential equations can by 


solved using dsolve. The system in the Illustration is defined with 


sys 2 := D(ul)(t) = —4u1(t) + 3u2(t) + 6, D(u2)(t) = —2.4u1 (1) + 1.6u2(t) + 3.6 


differentiation. 


and the initial conditions with 

init 2 := ul(0) = 0, u2(0) = 0 

The system is solved with the command 

sol 2 := dsolve({sys 2, init 2}, {ul(t), u2(0)}) 
and Maple responds with 


oe [s 3 9 9 _5 
fiHij——e" a =e 2° 
{« eae ~ ree 2 4 4° 
To isolate the individual functions we use 


rl := rhs(sol 2[1]); r2 := rhs(sol 2[2]) 


producing 


8 8 2 
9 —2t 2 —57t 
a 


and to determine the value of the functions at t = 0.5 we use 


evalf (subs(t = 0.5, 1r1)); evalf (subs(t = 0.5, r2)) 
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giving, in agreement with Table 5.19, 


1.793527048 
1.014415451 


The command dsolve will fail if an explicit solution cannot be found. In that case we 
can use the numeric option in dsolve, which applies the Runge-Kutta-Fehlberg technique. 
This technique can also be used, of course, when the exact solution can be determined with 
dsolve. For example, with the system defined previously, 


g := dsolve({sys 2, init 2}, {ul (t), u2(t)}, numeric) 


returns 
proc(x_rk f45)...end proc 


To approximate the solutions at tf = 0.5, enter 


g(0.5) 


which gives approximations in the form 


[t = 0.5, u2(t) = 1.014415563, w1(t) = 1.793527215] 


Higher-Order Differential Equations 


Many important physical problems—for example, electrical circuits and vibrating systems— 
involve initial-value problems whose equations have orders higher than one. New techniques 
are not required for solving these problems. By relabeling the variables, we can reduce 
a higher-order differential equation into a system of first-order differential equations and 
then apply one of the methods we have already discussed. 

A general mth-order initial-value problem 


ya) = fy. yy), as<t<b, 


with initial conditions y(a) = a1, y(a) = a2,...,y“~) (a) = am can be converted into a 
system of equations in the form (5.45) and (5.46). 
Let u(t) = y(t), w(t) = y'(),..., and u,,(t) = y“"— (#).. This produces the first-order 
system 
du, dy dun ~~ dy’ dum,  dy™"~?) 
=> =, =: = = - = 3, Oe = = Um, 
dt dt 


dt dt 


and 
du, dy") 
dt —s dit 


with initial conditions 


Sa = fy ay” Y= Fi ia) 


u(a)=y@) =o, wla)=y¥@)=o, ..., Uma) =y¥"P@ = an. 
Transform the the second-order initial-value problem 
y" — 2y' + 2y = e” sint, forO<t<1, with y(0) = —0.4, y'(0) = —0.6 


into a system of first order initial-value problems, and use the Runge-Kutta method with 
h = 0.1 to approximate the solution. 
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Solution Let u,(t) = y(t) and u2(t) = y'(t). This transforms the second-order equation 
into the system 


uy (t) = u(t), 
us, (t) = e” sint — 2uy(t) + 2u2(t), 
with initial conditions u,;(0) = —0.4, u2(0) = —0.6. 
The initial conditions give w;,9 = —0.4 and w2,9 = —0.6. The Runge-Kutta Eqs. (5.49) 
through (5.52) on page 330 with j = 0 give 
ki =hfi (to, Wi0, W2,0) = hw29 = —0.06, 


kia =A fr(to, io, W209) = h[e* sin to — 21,9 + 2w2,9] = —0.04, 


h 1 1 1 
ko, =hfi (1 + 57 W190 + Poe w20+ sha) =h 20 + sha | = —0.062, 


h 1 1 
ko. =hfr @ + 77 WL0 + roe w29 + ha) 


1 1 
=h aan sin(ty + 0.05) — 2 (ws + hu) +2 (w20 + sha) | 
= —0.03247644757, 


1 
k3, =h 20 + shoa| = —0.06162832238, 


1 1 
k32 =h ae sin(to + 0.05) — 2 (ws + sha) +2 (u20 + soa) | 


= —0.03152409237, 
ka, = h[woo + k32] = —0.06315240924, 


and 
kaa = h[et? sin(g + 0.1) — 2(wio + ka.) + 2(w20 + k3,2)] = —0.02178637298. 
So 
Wi = Wio + a (kis + 2ko1 + 2k31 + ka1) = —0.4617333423 
and 


1 
wri = wro + e(ki2 + 2ko2 + 2ksa + kar) = —0.6316312421. 


The value w1,; approximates u;(0.1) = y(0.1) = 0.2e?© (sin 0.1 — 2cos0.1), and 
w,1 approximates w2(0.1) = y'(0.1) = 0.2e?©) (4 sin0.1 — 3cos 0.1). 

The set of values w,; and w2;, for j = 0,1,...,10, are presented in Table 5.20 and 
are compared to the actual values of u,(t) = 0.2e*'(sint — 2cost) and u(t) = u(t) = 
0.2e*'(4 sint — 3 cosf). a 
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Table 5.20 

tj y(t) = us (4) Wi y(t) = u(t) Wj Iy(t)) — wiyl ly’ (4) — wal 
0.0 —0.40000000 —0.40000000 —0.6000000 —0.60000000 0 0 

0.1 —0.46173297 —0.46173334 —0.63 16304 —0.63163124 3.7 x 1077 7.75 x 1077 
0.2 —0.52555905 —0.52555988 —0.6401478 —0.64014895 8.3 x 1077 1.01 x 10° 
0.3 —0.58860005 —0.58860144 —0.6136630 —0.61366381 1.39 x 107-6 8.34 x 1077 
0.4 —0.64661028 —0.64661231 —0.5365821 —0.53658203 2.03 x 107° 1.79 x 1077 
0.5 —0.69356395 —0.69356666 —0.3887395 —0.38873810 2.71 x 107° 5.96 x 1077 
0.6 —0.72114849 —0.72115190 —0.1443834 —0.14438087 3.41 x 107-6 7.75 x 1077 
0.7 —0.71814890 —0.71815295 0.2289917 0.22899702 4.05 x 10~° 2.03 x 10~® 
0.8 —0.66970677 —0.66971133 0.7719815 0.77199180 4.56 x 107° 5.30 x 107° 
0.9 —0.55643814 —0.55644290 1.534764 1.5347815 4.76 x 1076 9.54 x 107° 
1.0 —0.35339436 —0.35339886 2.578741 2.5787663 4.50 x 10~° 1.34 x 10-5 


In Maple the nth derivative y” (t) 
is specified by (D@ @n)(y)(t). 


We can also use dsolve from Maple on higher-order equations. To define the differential 
equation in Example 1, use 


def 2 := (D@ @2)(y)(t) — 2D(y)(t) + 2y(t) = e* sin(t) 
and to specify the initial conditions use 

init 2 := y(0) = —0.4, Diy) (0) = —0.6 

The solution is obtained with the command 

sol 2 := dsolve({def 2, init 2}, y(t)) 


to obtain 
Lapa 
y(t) = 5° (sin(t) — 2 cos(t)) 


We isolate the solution in function form using 
g := rhs(sol 2) 

To obtain y(1.0) = g(1.0), enter 

evalf (subs(t = 1.0, g)) 


which gives —0.3533943574. 

Runge-Kutta-Fehlberg is also available for higher-order equations via the dsolve com- 
mand with the numeric option. It is employed in the same manner as illustrated for systems 
of equations. 

The other one-step methods can be extended to systems in a similar way. When error 
control methods like the Runge-Kutta-Fehlberg method are extended, each component of 
the numerical solution (w4;, w2;,..., Wj) must be examined for accuracy. If any of the 
components fail to be sufficiently accurate, the entire numerical solution (w4;, w2;, .. ., Wmj) 
must be recomputed. 

The multistep methods and predictor-corrector techniques can also be extended to 
systems. Again, if error control is used, each component must be accurate. The extension 
of the extrapolation technique to systems can also be done, but the notation becomes quite 
involved. If this topic is of interest, see [HNW1]. 

Convergence theorems and error estimates for systems are similar to those considered 
in Section 5.10 for the single equations, except that the bounds are given in terms of vector 
norms, a topic considered in Chapter 7. (A good reference for these theorems is [Gel], 
pp. 45-72.) 
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EXERCISE SET 59 


1. Use the Runge-Kutta method for systems to approximate the solutions of the following systems of 
first-order differential equations, and compare the results to the actual solutions. 
a ui, = 3u, + 2m) — (2? + le", (0) = 1; 
uy = 4u, + + (+ 2t-He*, wm(O)=1; O<t<1l; A=0.2; 
actual solutions u(t) = te - ze! +e" and wm(t)= er + et + Pe. 
b. ui, = —4u, — 2m + cost+4sint, (0) =0; 
uy = 3u,;+u,—3sint, m(O)=—-l, O<t<2; A=O0.1; 
actual solutions u(t) = 2e' — 2e-** + sint and u(t) = —3e' + 2e*. 
Cc uy =u, (0) =1; 
uy = —u,—2e'+ 1, u(0) =0; 
uz=—um—-e+1, w(QO)=1; O<t<2; A=O05; 
actual solutions u,(t) = cost +sint—e'+1, w(t) = —sint+ cost — e’, and uw3(t) = 
—sint+ cost. 
d. uy =m-—u34+t, u(0)=1; 
uy = 37, w(0)=1; 
u,=iuwt+e", w3(0)=—-1l; O<t<1l; A=01; 
actual solutions u,(t) = —0.05° + 0.254+14+2-—e7%, w(t) = P41, and w(t) = 
0.254 +t-—e7. 
2. Use the Runge-Kutta method for systems to approximate the solutions of the following systems of 
first-order differential equations, and compare the results to the actual solutions. 
a uy =u—m+2, u(0)=-—1; 
uy=—u+m+4t, mO0)=0; O<t<l; h=01; 


actual solutions u(t) = ag +P +4+2t— : and u(t) = D +P — _ 
2 2 2 2 
b ba! : be + z (0) 3 
. ui = nu u , ou = —3; 
ae ie oe ae 
wy =w+3t—4, m(0)=5; O<t<2; A=02; 
actual solutions u,(t) = —3e’ + ¢ and) =m (t) = 4e’ — 3¢ +1. 
Cc uy = Uy +2u).—2u,+e%, u,(0) =3; 
Uy =U) +u3—2e", wm(0)=-—1; 
uz =ujt+2u+e", w(O)=1; O<t<1l; A=O.1; 
actual solutions 1, (f) 3e'—3sint+6cost (t) : “hh : sint S cost og 
u = = . u = e > 
: 2 10 10 5 


and u3(t) = —e' + ae cost + : sin t oe. 
2) ) 5 
d. ui, = 3u, + 2u.-—u3—1—3t-—2sint, u,(0) =5; 
uy = Uy — 2u. + 3u,;+6—f+2sint+cost, u(0) = —9; 
uy = 2u,+4u3+8-2t, u3(0)=—-5; O<t<2; hA=0.2; 
actual solutions u(t) = 2e%* + 3e-% +1, u(t) = —8e-7 + e* — 2e*" 4 sint, and u3(f) = 
de — de* — eo — 2. 
3. Use the Runge-Kutta for Systems Algorithm to approximate the solutions of the following higher- 
order differential equations, and compare the results to the actual solutions. 


a. y’—2y/+y=te'—t, O<t<1, y(0)=y() =O, withh=01; 
actual solution y(t) = ite! — te’ +2e' —¢ —2. 


b. Py’ —2n'+2y=f/Int, 1<t<2, y)=1, ywd)=0, withhh=0.1; 
actual solution y(t) = tr + if Int — ae 

ce y"’4+2y’-y-2€=e, O<t<3, yO=H1, YO)=2, yO) =0, withh = 0.2; 
actual solution y(t) = Bel + xe - ze 7 + tte. 


d. Py” — Py’ + 3h!’ —4y = 5PInt+97, 1<t<2, yd)=0, yYU)=1, y’d)=3, 
withh = 0.1; actual solution y(t) = —? + tcos(Int) + rsindnt) +f Int. 
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4. Use the Runge-Kutta for Systems Algorithm to approximate the solutions of the following higher- 
order differential equations, and compare the results to the actual solutions. 


ay” —3y +2y=6e%, O<r<1, yO)=y(O) =2, withh=0.1; 
actual solution y(t) = 2e* — e! +e. 
b Py" +t’ -—4y=-34, 1<t<3, yd)=4, y(1) =3, withh =02; 
actual solution y(t) = 277 +f+ 17. 
ae y”+y"—4y-4y=0, 0<rt<2, yO)=3, yYO=-l1, y’O) =9, withh = 0.2; 
actual solution y(t) = e~' + e% + e7*", 
d. Py” + Py" —2n'4+2y =8P -2, 1<t<2, yd)=2, yU)=8, y’d) =6, with 
h=0.1; actual solution y(t) = 2-1! +P4+P -1. 
5. Change the Adams Fourth-Order Predictor-Corrector Algorithm to obtain approximate solutions to 
systems of first-order equations. 
6. Repeat Exercise 2 using the algorithm developed in Exercise 5. 
7. Repeat Exercise | using the algorithm developed in Exercise 5. 


8. Suppose the swinging pendulum described in the lead example of this chapter is 2 ft long and that 
g = 32.17 ft/s’. With h = 0.1 s, compare the angle @ obtained for the following two initial-value 
problems at t = 0, 1, and 2s. 


6 ¢. XT : 
a + $sind=0, 0 =F, 0'O=0. 
“nO! peel 6(0)=~, 6(0)=0 
dt? L , 6° , 


9. The study of mathematical models for predicting the population dynamics of competing species has 
its origin in independent works published in the early part of the 20th century by A. J. Lotka and 
V. Volterra (see, for example, [Lol], [Lo2], and [Vo]). 

Consider the problem of predicting the population of two species, one of which is a predator, 
whose population at time f is x2(t), feeding on the other, which is the prey, whose population is x; (t). 
We will assume that the prey always has an adequate food supply and that its birth rate at any time 
is proportional to the number of prey alive at that time; that is, birth rate (prey) is k,x,(t). The death 
rate of the prey depends on both the number of prey and predators alive at that time. For simplicity, 
we assume death rate (prey) = ky, (t)x2(t). The birth rate of the predator, on the other hand, depends 
on its food supply, x; (tf), as well as on the number of predators available for reproduction purposes. 
For this reason, we assume that the birth rate (predator) is k3x; (t)x2(t). The death rate of the predator 
will be taken as simply proportional to the number of predators alive at the time; that is, death rate 
(predator) = k4x2(t). 

Since xj(f) and x(t) represent the change in the prey and predator populations, respectively, 
with respect to time, the problem is expressed by the system of nonlinear differential equations 


X(t) = ky (t) — kox ()xa(t) and x45 (t) = kgx1 (t)x2(t) — kyxo(t). 


Solve this system for 0 < ¢ < 4, assuming that the initial population of the prey is 1000 and of the 
predators is 500 and that the constants are ky = 3, ky = 0.002, kz = 0.0006, and ky = 0.5. Sketch a 
graph of the solutions to this problem, plotting both populations with time, and describe the physical 
phenomena represented. Is there a stable solution to this population model? If so, for what values x, 
and x, is the solution stable? 


10. In Exercise 9 we considered the problem of predicting the population in a predator-prey model. 
Another problem of this type is concerned with two species competing for the same food supply. If 
the numbers of species alive at time ¢ are denoted by x; (t) and x2(f), itis often assumed that, although 
the birth rate of each of the species is simply proportional to the number of species alive at that time, 
the death rate of each species depends on the population of both species. We will assume that the 
population of a particular pair of species is described by the equations 


dx, (t) dx (t) 
Fp = 214 — 0.0003x (A) — 0.0004%(t)] and 


= X7(t)[2 — 0.0002x, (t) — 0.0001x2(r)]. 
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If it is known that the initial population of each species is 10,000, find the solution to this system for 
0 < t < 4. Is there a stable solution to this population model? If so, for what values of x; and x2 is 


the solution stable? 


| Si 5.10 Stability 


A number of methods have been presented in this chapter for approximating the solution 
to an initial-value problem. Although numerous other techniques are available, we have 


Definition 5.18 


A one-step method is consistent 
if the difference equation for the 
method approaches the 
differential equation as the step 
size goes to zero. 


Definition 5.19 


A method is convergent if the 
solution to the difference 
equation approaches the solution 
to the differential equation as the 
step size goes to zero. 


Example 1 
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chosen the methods described here because they generally satisfied three criteria: 


e Their development is clear enough so that you can understand how and why they work. 


© One or more of the methods will give satisfactory results for most of the problems that 
are encountered by students in science and engineering. 


© Most of the more advanced and complex techniques are based on one or a combination 
of the procedures described here. 


One-Step Methods 


In this section, we discuss why these methods are expected to give satisfactory results when 
some similar methods do not. Before we begin this discussion, we need to present two 
definitions concerned with the convergence of one-step difference-equation methods to the 
solution of the differential equation as the step size decreases. 


A one-step difference-equation method with local truncation error t;(/) at the ith step is 
said to be consistent with the differential equation it approximates if 


lim max |z;(A)| = 0. 
h>0 1<i<N 


Note that this definition is a local definition since, for each of the values 1;(/), we 
are assuming that the approximation w;_; and the exact solution y(¢;_1) are the same. A 
more realistic means of analyzing the effects of making h small is to determine the global 
effect of the method. This is the maximum error of the method over the entire range of the 
approximation, assuming only that the method gives the exact result at the initial value. 


A one-step difference-equation method is said to be convergent with respect to the differ- 
ential equation it approximates if 


lim max |w; — y(t;)| = 0, 
pe! i— yu! 


where y(t;) denotes the exact value of the solution of the differential equation and w; is the 


approximation obtained from the difference method at the ith step. 


Show that Euler’s method is convergent. 


Solution Examining Inequality (5.10) on page 271, in the error-bound formula for Euler’s 
method, we see that under the hypotheses of Theorem 5.9, 


Mh , 
ie yee 
max |wi— yG)| = 5 le I]. 
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A method is stable when the 
results depend continuously on 


the initial data. 


Theorem 5.20 


Initial-Value Problems for Ordinary Differential Equations 


However, M, L, a, and b are all constants and 


inate esi tin (OAS 
h>O1<i<N > a0 2L , 
So Euler’s method is convergent with respect to a differential equation satisfying the con- 
ditions of this definition. The rate of convergence is O(h). a 


A consistent one-step method has the property that the difference equation for the 
method approaches the differential equation when the step size goes to zero. So the local 
truncation error of a consistent method approaches zero as the step size approaches zero. 

The other error-bound type of problem that exists when using difference methods to 
approximate solutions to differential equations is a consequence of not using exact results. 
In practice, neither the initial conditions nor the arithmetic that is subsequently performed 
is represented exactly because of the round-off error associated with finite-digit arithmetic. 
In Section 5.2 we saw that this consideration can lead to difficulties even for the convergent 
Euler’s method. 

To analyze this situation, at least partially, we will try to determine which methods are 
stable, in the sense that small changes or perturbations in the initial conditions produce 
correspondingly small changes in the subsequent approximations. 

The concept of stability of a one-step difference equation is somewhat analogous to 
the condition of a differential equation being well-posed, so it is not surprising that the 
Lipschitz condition appears here, as it did in the corresponding theorem for differential 
equations, Theorem 5.6 in Section 5.1. 

Part (i) of the following theorem concerns the stability of a one-step method. The 
proof of this result is not difficult and is considered in Exercise 1. Part (ii) of Theorem 5.20 
concerns sufficient conditions for a consistent method to be convergent. Part (iii) justifies the 
remark made in Section 5.5 about controlling the global error of a method by controlling 
its local truncation error and implies that when the local truncation error has the rate of 
convergence O(h"), the global error will have the same rate of convergence. The proofs of 
parts (ii) and (111) are more difficult than that of part (1), and can be found within the material 
presented in [Gel], pp. 57-58. 


Suppose the initial-value problem 
y=f(y), ast<b, ya=a, 
is approximated by a one-step difference method in the form 
wo = a, 
Win = Wi thot, wi, A). 


Suppose also that a number fig > O exists and that @(t, w, h) is continuous and satisfies a 
Lipschitz condition in the variable w with Lipschitz constant L on the set 


D=({(t,w,h) |a<t<band —-w<w<ow,0 <h < ho}. 
Then 
(i) The method is stable; 


(ii) The difference method is convergent if and only if it is consistent, which is 
equivalent to 


d(t,y,0)= ft,y), forala<t<b; 
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(iii) If a function t exists and, for each i = 1,2,...,N, the local truncation error 
t;(h) satisfies |t;(1)| < t(h) whenever 0 < h < ho, then 
t(h 
ly) — wil < eM ea, 


The Modified Euler method is given by wo = a, 


h 
winl = wit 5 [f(ti, wi) + f(ta1,wi thf(i,wi))|, fori=0,1,...,N—1. 


Verify that this method is stable by showing that it satisfies the hypothesis of Theorem 5.20. 
Solution For this method, 


p(t, w,h) = 5fttw) a sheet hw +hf(t,w)). 


If f satisfies a Lipschitz condition on {(t, w) |a <t< band — oo < w < o@} in the 
variable w with constant L, then, since 


H(t, Ww.) ~ GU, 0,M) = 5 fw) + 5 fe thw thflew)) 
a 5 f(t) _ sre +hwt+hf(t,w)), 
the Lipschitz condition on f leads to 
OC, w.h) ~ p(t, M9] <5 Ew —W]-+ SL lw + hf tt, w) —w —hftt,w) 
< Lw ~ DW) + 5LIhft,w) ~ hfe, 


w| 


1 
< Llw—w|+ 5h lw 


1, _ 


Therefore, ¢ satisfies a Lipschitz condition in w on the set 
{(t,w,h)|a<t<b,—co < w < oo, and0 <h < hy}, 


for any ho > O with constant 
! i 2 
L=L+ x hol ' 


Finally, if f is continuous on {(t,w) | a < t < b,-—c < w < ov}, then ¢ is 
continuous on 


{(t,w,h)|a<t<b,—co < w < oo, and0 <h < ho}; 


so Theorem 5.20 implies that the Modified Euler method is stable. Letting h = 0, we have 


1 1 
$C, w.0) = Fw) + -fE+0,.w+0- few) = f@w), 


so the consistency condition expressed in Theorem 5.20, part (ii), holds. Thus, the method 
is convergent. Moreover, we have seen that for this method the local truncation error is 
O(h’), so the convergence of the Modified Euler method is also O(n’). | 
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Initial-Value Problems for Ordinary Differential Equations 


Multistep Methods 


For multistep methods, the problems involved with consistency, convergence, and stability 
are compounded because of the number of approximations involved at each step. In the one- 
step methods, the approximation w;+,, depends directly only on the previous approximation 
w;, whereas the multistep methods use at least two of the previous approximations, and the 
usual methods that are employed involve more. 

The general multistep method for approximating the solution to the initial-value 
problem 


y=f(t.y), a<t<b, ya=a, (5.54) 
has the form 
Wo = a, W, = Qa, eeey Wy-l] = Amn-1, 
Wit] = Gm—1Wi + Am—2Wi-1 + +++ + AoWit1-m + AF (Gh, A, Wisi, Wi, ++, Witl-m)s 
(5.55) 


for each i = m— 1,m,...,N — 1, where do, a1,...,@n+41 are constants and, as usual, 
h=(b—a)/N andt; =a-+ih. 
The local truncation error for a multistep method expressed in this form is 
Y(ti41) — Am—1y (ti) — +++ — Aoy(tit-1—m) 
h 


— F(t, h, yi+1), Yi), «+s ¥(tit1—m))s 


Ti+1(h) = 


for each i = m— 1,m,...,N — 1. As in the one-step methods, the local truncation er- 
ror measures how the solution y to the differential equation fails to satisfy the difference 
equation. 

For the four-step Adams-Bashforth method, we have seen that 


251 (5) 4 
ti41(h) = = y'(uih", for some p4; € (4-3, ti41), 
720 
whereas the three-step Adams-Moulton method has 
19 os) 4 
Ti41(h) = — 3509 (ui)h", for some 4; € (4-2, fi+1), 


provided, of course, that y € C°[a, b]. 
Throughout the analysis, two assumptions will be made concerning the function F: 


e If f =0 (that is, if the differential equation is homogeneous), then F = 0 also. 


e F satisfies a Lipschitz condition with respect to {w,}, in the sense that a constant L exists 
and, for every pair of sequences {yj} and {ij} and fori =m-—1,m,...,N—1, we 
have 


m 


[Fishy Vint, 6 Vigiem) — FGA, B41, + Figt—m)| SLO |vigsj — Gigs 
j=0 


The explicit Adams-Bashforth and implicit Adams-Moulton methods satisfy both of 
these conditions, provided f satisfies a Lipschitz condition. (See Exercise 2.) 

The concept of convergence for multistep methods is the same as that for one-step 
methods. 
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e A multistep method is convergent if the solution to the difference equation approaches 
the solution to the differential equation as the step size approaches zero. This means that 
limpo Maxo<i<n | wi — y(ti)| = O. 


For consistency, however, a slightly different situation occurs. Again, we want a multi- 
step method to be consistent provided that the difference equation approaches the differential 
equation as the step size approaches zero; that is, the local truncation error approaches zero 
at each step as the step size approaches zero. The additional condition occurs because of 
the number of starting values required for multistep methods. Since usually only the first 
starting value, wo = a, is exact, we need to require that the errors in all the starting values 
{a;} approach zero as the step size approaches zero. So 


lim |7;(A)| =0, foralli=m,m+1,...,N and (5.56) 


lim Jo; — y(4)| =0, foralli=1,2,...,.m—1, (5.57) 


must be true for a multistep method in the form (5.55) to be consistent. Note that (5.57) 
implies that a multistep method will not be consistent unless the one-step method generating 
the starting values is also consistent. 

The following theorem for multistep methods is similar to Theorem 5.20, part (iii), 
and gives a relationship between the local truncation error and global error of a multistep 
method. It provides the theoretical justification for attempting to control global error by 
controlling local truncation error. The proof of a slightly more general form of this theorem 
can be found in [IK], pp. 387-388. 


Theorem 5.21 Suppose the initial-value problem 


y=fty), a<t<b, ya@=a, 


is approximated by an explicit Adams predictor-corrector method with an m-step Adams- 
Bashforth predictor equation 


Witt = Wit h[bn-1f G, wi) + +--+ bof tit1-m, Wit1-m)], 


with local truncation error 1;+;(/), and an (m — 1)-step implicit Adams-Moulton corrector 
equation 


Wii = With [Bn fis wis) + bm—2f (ti, wi) +++ + Bo f (tiz2—m: wisa-m)] ; 
with local truncation error 7;+)(/). In addition, suppose that f(t, y) and f,(¢, y) are contin- 


uous on D = {(t, y) |a < t < band —oo < y < oo} and that f, is bounded. Then the local 
truncation error 0;+;(/) of the predictor-corrector method is 


= ~ 0 
oi41(h) = Ti (h) + ti (A) bm ae 1, 9:41), 


where 6; is anumber between zero and ht; (A). 
Moreover, there exist constants k,; and ky such that 


|w; = y(t)| < max | w; — y(t) + kat] ef2i-@) | 
<j<m—-1 


0<j 


where o (h) = max» <j<y |o;(h)|. | 
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Before discussing connections between consistency, convergence, and stability for mul- 
tistep methods, we need to consider in more detail the difference equation for a multistep 
method. In doing so, we will discover the reason for choosing the Adams methods as our 
standard multistep methods. 

Associated with the difference equation (5.55) given at the beginning of this discussion, 


Wo = QA, Wy =], .-., Wy] = An-1> 
Wit) = Am—1Wi + Am—2Wi-1 + +++ + GqWit1—m + AF (Gi, A, Wisi, Wir. +, Wit1—m)s 
is a polynomial, called the characteristic polynomial of the method, given by 
P(A) = 2” = Amd"! = ay—2d™? = +» — aya — ap. (5.58) 


The stability of a multistep method with respect to round-off error is dictated the by 
magnitudes of the zeros of the characteristic polynomial. To see this, consider applying the 
standard multistep method (5.55) to the trivial initial-value problem 


y=0, ya=a, wherea £0. (5.59) 


This problem has exact solution y(t) = a. By examining Eqs. (5.27) and (5.28) in Section 
5.6 (see page 304), we can see that any multistep method will, in theory, produce the exact 
solution w, = a for all n. The only deviation from the exact solution is due to the round-off 
error of the method. 

The right side of the differential equation in (5.59) has f(t, y) = 0, so by assumption 
(1), we have F(¢;,h, wi+1, Wit2,---, Witi-m) = O in the difference equation (5.55). As a 
consequence, the standard form of the difference equation becomes 


Wit) = Gm—1Wi + Am—2Wj-1 + +++ + GQWit1—m- (5.60) 


Suppose 1 is one of the zeros of the characteristic polynomial associated with (5.55). 
Then w, = X” for each n is a solution to (5.59) since 


ivi = Gm al = moat! sees agaiti-m™ = pitlompym _ Gefp te ete e as a ao] —0. 
In fact, if A1,42,..., Am are distinct zeros of the characteristic polynomial for (5.55), it can 


be shown that every solution to (5.60) can be expressed in the form 


m 


Wa = ty cit, (5.61) 


i=1 


for some unique collection of constants c1,¢2,...,Cm- 
Since the exact solution to (5.59) is y(t) = a, the choice w, = a, for all n, is a solution 
to (5.60). Using this fact in (5.60) gives 


0 = & = AAy_| — AAn—2 — +++ — Ay = A[1 — Gy—1 — An—2 — +++ — Ao). 


This implies that 4 = 1 is one of the zeros of the characteristic polynomial (5.58). We will 
assume that in the representation (5.61) this solution is described by A; = 1 andc; = a, so 
all solutions to (5.59) are expressed as 


Wn = at he (5.62) 
i=2 


If all the calculations were exact, all the constants c2, c3,... , Cm would be zero. In practice, 
the constants c2,C3,...,Cm are not zero due to round-off error. In fact, the round-off error 
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grows exponentially unless |A;| < 1 for each of the roots A2,A3,...,Am. The smaller the 
magnitude of these roots, the more stable the method with respect to the growth of round-off 
error. 


In deriving (5.62), we made the simplifying assumption that the zeros of the char- 
acteristic polynomial are distinct. The situation is similar when multiple zeros occur. For 
example, if Ay = Agy1 = +++ = Agyp for some k and p, it simply requires replacing the sum 


n n n 
CKAR ae Chi rgay os ee Ch pAk+p 


in (5.62) with 


n—p 


Crag + CEpmay + Cran(n — NF +--+ + Ceyp[n(n— 1)---(2—-pt DA, ~. 
(5.63) 


(See [He2], pp. 119-145.) Although the form of the solution is modified, the round-off error 
if |Ax| > 1 still grows exponentially. 

Although we have considered only the special case of approximating initial-value 
problems of the form (5.59), the stability characteristics for this equation determine the 
stability for the situation when f (ft, y) is not identically zero. This is because the solution to 
the homogeneous equation (5.59) is embedded in the solution to any equation. The following 
definitions are motivated by this discussion. 


Definition 5.22 Let A1,A2,..., Am denote the (not necessarily distinct) roots of the characteristic equation 
P(A) =A” = am_A™ | —----— aA — ay = 0 
associated with the multistep difference method 


Wo =A, Wr = QA, «2-5 Wm-1 = Am-1 


Wit) = Am—1Wi + An—2Wj_-1 + +++ + GqWit1-m + AF (Gj, h, Wiz), Wi, -. +, Wit1—m)- 


If |A;| < 1, for each i = 1,2,...,m, and all roots with absolute value | are simple roots, 
then the difference method is said to satisfy the root condition. a 
Definition 5.23 (i) Methods that satisfy the root condition and have A = | as the only root of the 


characteristic equation with magnitude one are called strongly stable. 


(ii) Methods that satisfy the root condition and have more than one distinct root with 
magnitude one are called weakly stable. 


(iii) Methods that do not satisfy the root condition are called unstable. a 


Consistency and convergence of a multistep method are closely related to the round-off 
stability of the method. The next theorem details these connections. For the proof of this 
result and the theory on which it is based, see [IK], pp. 410-417. 


Theorem 5.24 A multistep method of the form 
Wo = a, Wi = a, tee Wm-1 = An-1, 
Witt = Am—1Wi + An—2Wj-1 + + AQWiptim + AF (Gj, h, Wi41, Wir. ++, Wit1—m) 


is stable if and only if it satisfies the root condition. Moreover, if the difference method 
is consistent with the differential equation, then the method is stable if and only if it is 
convergent. a 
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Example 3 


Example 4 


Example 5 


Table 5.21 


Initial-Value Problems for Ordinary Differential Equations 


The fourth-order Adams-Bashforth method can be expressed as 
Witt = Wi + APC, h, Wisi, Wi,---, Wi-3), 
where 
h 
F(t, h, Witt,,---» Wi-3) = yg oF wi) — 59 f (ti-1, Wi-1) 


+ 37f (t-2, Wi-2) — Of (H-3, wi-3)]; 


Show that this method is strongly stable. 


Solution In this case we have m = 4, ag = 0, a; = 0, a = O, and a3; = 1, so the 
characteristic equation for this Adams-Bashforth method is 


O=PA)='M-H=HMRA-—1). 


This polynomial has roots A, = 1, Az = 0, A3 = 0, and A4 = 0. Hence it satisfies the root 
condition and is strongly stable. 

The Adams-Moulton method has a similar characteristic polynomial, P(A) = A> — 22, 
with zeros A; = 1, Az = 0, and A3 = 0, and is also strongly stable. a 


Show that the fourth-order Milne’s method, the explicit multistep method given by 


Ah 
Wi+l = Wi-3 + z [2f (i, wi — f Gi-1, wi-1) + 2f G2, wi-2) | 


satisfies the root condition, but it is only weakly stable. 


Solution The characteristic equation for this method, 0 = P(A) = 44 — 1, has four roots 


with magnitude one: 4; = 1, Az = —1, A3 = i, and A4 = —i. Because all the roots have 
magnitude 1, the method satisfies the root condition. However, there are multiple roots with 
magnitude 1, so the method is only weakly stable. a 


Apply the strongly stable fourth-order Adams-Bashforth method and the weakly stable 
Milne’s method with h = 0.1 to the initial-value problem 


y =-6y+6, O<r<1, y(0)=2, 


which has the exact solution y(t) = 1 + et, 


Solution The results in Table 5.21 show the effects of a weakly stable method versus a 


strongly stable method for this problem. a 
Adams-Bashforth Milne’s 
Exact Method Error Method Error 
tj y(t) Wi lyi = Wi Wj lyi — wil 

0.10000000 1.5488116 1.5488116 

0.20000000 1.3011942 1.3011942 

0.30000000 1.1652989 1.1652989 

0.40000000 1.0907180 1.0996236 8.906 x 10-3 1.0983785 7.661 x 1073 
0.50000000 1.0497871 1.0513350 1.548 x 1073 1.0417344 8.053 x 1073 
0.60000000 1.0273237 1.0425614 1.524 x 10-7 1.0486438 2.132 x 10-7 
0.70000000 1.0149956 1.0047990 1.020 x 107? 0.9634506 5.154 x 107? 
0.80000000 1.0082297 1.0359090 2.768 x 107? 1.1289977 1.208 x 107! 
0.90000000 1.0045 166 0.9657936 3.872 x 10-7 0.7282684 2.762 x 107! 
1.00000000 1.0024788 1.0709304 6.845 x 107? 1.6450917 6.426 x 107! 
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The reason for choosing the Adams-Bashforth-Moulton as our standard fourth-order 
predictor-corrector technique in Section 5.6 over the Milne-Simpson method of the same 
order is that both the Adams-Bashforth and Adams-Moulton methods are strongly stable. 
They are more likely to give accurate approximations to a wider class of problems than 
is the predictor-corrector based on the Milne and Simpson techniques, both of which are 
weakly stable. 


EXERCISE SET 5.10 


1. To prove Theorem 5.20, part (i), show that the hypotheses imply that there exists a constant K > 0 
such that 


lu; — v;| < K|ug — vo], foreach 1 <i<JN, 
N 
i=l 
2. For the Adams-Bashforth and Adams-Moulton methods of order four, 
a. Show that if f = 0, then 


whenever {u;}/_, and {v;}/_, satisfy the difference equation w;,; = w; + Ad(t;, wi, h). 


F(t, h, Wig, +++, Witi-m) = 0. 


b. Show that if f satisfies a Lipschitz condition with constant L, then a constant C exists with 


m 


[F (ti, h, Wigs, «+s Witt—m) — Fish, Vigss «6 Vigt-m)| <C> [wip y — vigil. 
j=0 


3. Use the results of Exercise 32 in Section 5.4 to show that the Runge-Kutta method of order four is 
consistent. 


4. Consider the differential equation 
y=f(ty), ast<b, ya@=a. 


a. Show that 


—3y(t)) + 4y Gin) — yCi42) 4 IW? 


oh 3 (1); 


y(t) = 


for some &, where t; < &; < tj42. 


b. Part (a) suggests the difference method 
Wi42 = 4wist = 3w; — 2h f (ti, wi); fori = 0, 1, sae ,N — 2. 
Use this method to solve 


y=l-y, O<r<1, yO)=0, 


with h = 0.1. Use the starting values wo = 0 and w; = y(t) = 1 — eo, 


c. Repeat part (b) with h = 0.01 and w; = 1 — e7®, 
d. Analyze this method for consistency, stability, and convergence. 


5. Given the multistep method 


3 1 
Wit = =a + 3wj)_1 = 7 vi + 3hf (t;, wi), for i= 232: . N= 1, 


with starting values wo, w), W2: 
a. Find the local truncation error. 


b. Comment on consistency, stability, and convergence. 
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6. Obtain an approximate solution to the differential equation 


y=-y, 0<t<10, y0)=1 


using Milne’s method with h = 0.1 and then h = 0.01, with starting values wo = 1 and w; = e~" in 


both cases. How does decreasing h from h = 0.1 to h = 0.01 affect the number of correct digits in 
the approximate solutions at t = 1 and t = 10? 


7. Investigate stability for the difference method 
Witt = —4w; + Swi + 2ALf(G, wi) + 2Af (4-1, wiv), 


fori = 1,2,...,N — 1, with starting values wo, wy. 


8. Consider the problem y’ = 0, 0 < t < 10, y(0) = 0, which has the solution y = 0. If the difference 
method of Exercise 4 is applied to the problem, then 


Wis = 4w; —3w;-1, fori=1,2,...,N—1, 


wo =0, and w; =aQ. 


Suppose w; = a, = €, where « is a small rounding error. Compute w; exactly for i = 2,3,...,6 to 
find how the error ¢ is propagated. 


| a 5.11 Stiff Differential Equations 


All the methods for approximating the solution to initial-value problems have error terms that 
involve a higher derivative of the solution of the equation. If the derivative can be reasonably 
bounded, then the method will have a predictable error bound that can be used to estimate the 
accuracy of the approximation. Even if the derivative grows as the steps increase, the error 
can be kept in relative control, provided that the solution also grows in magnitude. Problems 
frequently arise, however, when the magnitude of the derivative increases but the solution 
does not. In this situation, the error can grow so large that it dominates the calculations. 
Initial-value problems for which this is likely to occur are called stiff equations and are 
quite common, particularly in the study of vibrations, chemical reactions, and electrical 
circuits. 

Stiff differential equations are characterized as those whose exact solution has a term 
of the form e~“, where c is a large positive constant. This is usually only a part of the 
solution, called the transient solution. The more important portion of the solution is called 
the steady-state solution. The transient portion of a stiff equation will rapidly decay to zero 
as t increases, but since the nth derivative of this term has magnitude c"e~“, the derivative 
does not decay as quickly. In fact, since the derivative in the error term is evaluated not 
at t, but at a number between zero and f, the derivative terms can increase as ft increases— 
and very rapidly indeed. Fortunately, stiff equations generally can be predicted from the 
physical problem from which the equation is derived and, with care, the error can be kept 
under control. The manner in which this is done is considered in this section. 


Stiff systems derive their name 
from the motion of spring and 
mass systems that have large 
spring constants. 


Illustration The system of initial-value problems 


1 4 
uy = 9u; + 24u2 + S5cost — 3 sins, u,(0) = . 


1 2 
us, = —24u, — 51u, — 9cost+ 3 sint, u2(0)= 3 
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has the unique solution 


1 1 
HOS=Ie* ae a . cost, u(t) = —e~* + 2e* — a cost. 
The transient term e~*” in the solution causes this system to be stiff. Applying Algorithm 
5.7, the Runge-Kutta Fourth-Order Method for Systems, gives results listed in Table 5.22. 
When h = 0.05, stability results and the approximations are accurate. Increasing the step 
size to h = 0.1, however, leads to the disastrous results shown in the table. 


Table 5.22 


wi(t) w(t) w2(t) w2(t) 

t uy(t) h = 0.05 h=0.1 Uu2(t) h = 0.05 h=0.1 
0.1 1.793061 1.712219 —2.645169 —1.032001 —0.8703152 7.844527 
0.2 1.423901 1.414070 —18.45158 —0.8746809 —0.8550148 38.87631 
0.3 1.131575 1.130523 —87.47221 —0.7249984 —0.7228910 176.4828 
0.4 0.9094086 0.9092763 —934.0722 —0.6082141 —0.6079475 789.3540 
0.5 0.7387877 9.7387506 —1760.016 —0.5156575 —0.5155810 3520.00 
0.6 0.6057094 0.6056833 —7848.550 —0.4404108 —0.4403558 15697.84 
0.7 0.4998603 0.4998361 —34989.63 —0.3774038 —0.3773540 69979.87 
0.8 0.4136714 0.4136490 —155979.4 —0.3229535 —0.3229078 311959.5 
0.9 0.3416143 0.3415939 —695332.0 —0.2744088 —0.2743673 1390664. 
1.0 0.2796748 0.2796568 —3099671. —0.2298877 —0.2298511 6199352. 


Although stiffness is usually associated with systems of differential equations, the 
approximation characteristics of a particular numerical method applied to a stiff system can 
be predicted by examining the error produced when the method is applied to a simple fest 
equation, 


y =ay, y(0)=a, wheredA < 0. (5.64) 


The solution to this equation is y(t) = we’, which contains the transient solution e*’. The 
steady-state solution is zero, so the approximation characteristics of a method are easy to 
determine. (A more complete discussion of the round-off error associated with stiff systems 
requires examining the test equation when A is a complex number with negative real part; 
see [Gel], p. 222.) 

First consider Euler’s method applied to the test equation. Letting h = (b — a)/N and 
t; = jh, for j = 0,1,2,...,N, Eq. (5.8) on page 266 implies that 


wo=a, and wy =uj,th(au;) = (1 +ha)u,, 
so 
wir =U +hayt wy =(1+haytla, forj=0,1,...,.N—1. (5.65) 
Since the exact solution is y(t) = we’, the absolute error is 
| yj) — wil = Je — A +h)! | Joel = |e) — + hd) | lal, 


and the accuracy is determined by how well the term 1 +-hA approximates e’”. When A < 0, 
the exact solution (e’”)/ decays to zero as j increases, but by Eq.(5.65), the approximation 
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will have this property only if |1 + haA| < 1 , which implies that —2 < hd < 0. This 
effectively restricts the step size h for Euler’s method to satisfy h < 2/|A|. 

Suppose now that a round-off error do is introduced in the initial condition for Euler’s 
method, 


Wo = at do. 
At the jth step the round-off error is 
5) = (1 + ha)450. 


Since A < 0, the condition for the control of the growth of round-off error is the same as 
the condition for controlling the absolute error, |1 + A| < 1, which implies that h < 2/|A|. 
So 


e Euler’s method is expected to be stable for 
y =Ady, y(0)=a, where dA < 0, 


only if the step size h is less than 2/|A|. 


The situation is similar for other one-step methods. In general, a function Q exists with 
the property that the difference method, when applied to the test equation, gives 


Witt = O(hdA)w;. (5.66) 


The accuracy of the method depends upon how well Q(hA) approximates e’, and the error 
will grow without bound if |Q(hA)| > 1. An nth-order Taylor method, for example, will 
have stability with regard to both the growth of round-off error and absolute error, provided 
h is chosen to satisfy 


1 442 Da 
L+hA+ Aer’ +--+ =A") <I. 
2 n! 


Exercise 10 examines the specific case when the method is the classical fourth-order Runge- 
Kutta method,which is essentially a Taylor method of order four. 
When a multistep method of the form (5.54) is applied to the test equation, the result is 


Witt = Am—-1Wj +++ + AgWjttem + hA(bm Wj + Dm—1 Wj + +++ + bowj+1—m), 
forj =m—1,...,N—1,or 
(1 = Addn) Wa — Gn—1 + AADn—1) Wj — +++ — (do + Abo) Wj +1-m = 9. 
Associated with this homogeneous difference equation is a characteristic polynomial 
O(z, Wd) = (1 = AADm)Z"" = (G1 + RAB —1)Z"~! — +++ — (ag + hbo). 


This polynomial is similar to the characteristic polynomial (5.58), but it also incorporates 
the test equation. The theory here parallels the stability discussion in Section 5.10. 

Suppose wo,...,Wm—1 are given, and, for fixed hd, let £i,..., 6, be the zeros of the 
polynomial Q(z, hd). If B1,..., Bm are distinct, then cj,..., Cj exist with 


m 


w= > cx(Bi)!, forj =0,...,N. (5.67) 
k=1 


If Q(z, hd) has multiple zeros, w; is similarly defined. (See Eq. (5.63) in Section 5.10.) If w; 
is to accurately approximate y(t;) = e” = (e)/, then all zeros f; must satisfy |Bx| < 1; 
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otherwise, certain choices of a will result in c, ~ 0, and the term cx (B.)4 will not decay to 
zero. 


Illustration The test differential equation 
: 1 
y=-30y,, O<t<15, yO)= 3 


has exact solution y = se" . Using h = 0.1 for Euler’s Algorithm 5.1, Runge-Kutta 
Fourth-Order Algorithm 5.2, and the Adams Predictor-Corrector Algorithm 5.4, gives the 
results at ¢ = 1.5 in Table 5.23. 


Table 5.23 Exact solution 9.54173 x 10-2! 
Euler’s method —1.09225 x 104 
Runge-Kutta method 3.95730 x 10! 
Predictor-corrector method 8.03840 x 10° 


The inaccuracies in the Illustration are due to the fact that |O(/)| > 1 for Euler’s 
method and the Runge-Kutta method and that Q(z, hA) has zeros with modulus exceeding 
1 for the predictor-corrector method. To apply these methods to this problem, the step 
size must be reduced. The following definition is used to describe the amount of step-size 
reduction that is required. 


Definition 5.25 The region R of absolute stability for a one-step method is R = {ha € C | |O(hA)| < 1}, 
and for a multistep method, itis R = {ha € C | |x| < 1, for all zeros 6, of Q(z, haA)}. mu 


Equations (5.66) and (5.67) imply that a method can be applied effectively to a stiff 
equation only if AA is in the region of absolute stability of the method, which for a given 
problem places a restriction on the size of h. Even though the exponential term in the exact 
solution decays quickly to zero, Ah must remain within the region of absolute stability 
throughout the interval of t values for the approximation to decay to zero and the growth of 
error to be under control. This means that, although h could normally be increased because 
of truncation error considerations, the absolute stability criterion forces h to remain small. 
Variable step-size methods are especially vulnerable to this problem because an examination 
of the local truncation error might indicate that the step size could increase. This could 
inadvertently result in Ah being outside the region of absolute stability. 

The region of absolute stability of a method is generally the critical factor in producing 
accurate approximations for stiff systems, so numerical methods are sought with as large 
a region of absolute stability as possible. A numerical method is said to be A-stable if its 
region R of absolute stability contains the entire left half-plane. 


This method is implicit because it The Implicit Trapezoidal method, given by 
involves w;,; on both sides of the 
equation. Wo = a, (5.68) 


h : 
Wit] = Wi + 5 [fG+ want fGw)|, O<j<N-1, 


is an A-stable method (see Exercise 15) and is the only A-stable multistep method. Although 
the Trapezoidal method does not give accurate approximations for large step sizes, its error 
will not grow exponentially. 
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The techniques commonly used for stiff systems are implicit multistep methods. Gen- 
erally w;+1 is obtained by solving a nonlinear equation or nonlinear system iteratively, often 
by Newton’s method. Consider, for example, the Implicit Trapezoidal method 


h 
Wj = Wt lf Gr wit) + f(G, w,)). 
Having computed ¢;, 441, and w;, we need to determine wj+1, the solution to 


h 
Fw) = w — wy — xf Gu w) + FG, wj)] = 9. (5.69) 


To approximate this solution, select w @ Nee usually as w;, and generate Te by applying 


Newton’s method to (5.69), 


Lr 1) 
w® = wk) — F(wjar ) 
Wier = Wis Fw ie 7D) 
(kK-1) (k=1) 
wh PD = Wiad ATF (t, wj) + f G+, w jt )] 
i+] 


— 2 AGa1, we) 
until [wy - Te | is sufficiently small. This is the procedure that is used in Algorithm 
5.8. Normally only three or four iterations per step are required, because of the quadratic 
convergence of Newton’s mehod. 

The Secant method can be used as an alternative to Newton’s method in Eq. (5.69), 
but then two distinct initial ene to wj,1 are required. To employ the Secant 


method, the usual practice is to let w wer = w; and obtain we from some explicit multistep 
method. When a system of stiff equations is involved, a generalization is required for either 
Newton’s or the Secant method. These topics are considered in Chapter 10. 


Trapezoidal with Newton Iteration 
To approximate the solution of the initial-value problem 
y=f(t,y), fora<t<b, withy(a)=a 


at (N + 1) equally spaced numbers in the interval [a, b]: 


INPUT endpoints a, b; integer N; initial condition w; tolerance TOL; maximum number 
of iterations M at any one step. 


OUTPUT approximation w to y at the (N + 1) values of ¢ or a message of failure. 


Step 1 Seth = (b—a)/N; 


t=a; 
w=a; 
OUTPUT (t, w). 


Step 2 Fori=1,2,...,N do Steps 3-7. 


Step3 Seth, =w+4f(tw); 
wo = ky; 
j= 
FLAG = 0. 
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Step 4 While FLAG = 0 do Steps 5-6. 


h 
wo — ZF +h, wo) — ki 
Step 5 Setw=wo- . 


h 
1—- zit h, wo) 


Step 6 If |w — wo| < TOL then set FLAG = 1 
else setj7 =j +1; 
Wo = W; 
if j > M then 
OUTPUT (‘The maximum number of 
iterations exceeded’); 
STOP. 


Step 7 Sett=a+ih; 
OUTPUT (t, w). 


Step 8 STOP. a 


Illustration The stiff initial-value problem 


y =Se"y—n* +1, O<r<1, yO)=-1 


has solution y(t) = t—e~*'. To show the effects of stiffness, the Implicit Trapezoidal method 


and the Runge-Kutta fourth-order method are applied both with N = 4, giving h = 0.25, 
and with NV = 5, giving h = 0.20. 


The Trapezoidal method performs well in both cases using M = 10 and TOL = 10~°, 
as does Runge-Kutta with h = 0.2. However, h = 0.25 is outside the region of absolute 
stability of the Runge-Kutta method, which is evident from the results in Table 5.24. 


Table 5.24 


Runge-Kutta Method Trapezoidal Method 
h=0.2 h=0.2 
tj Wi ly@i) — wil Wi ly(ti) — wil 
0.0 —1.0000000 0 —1.0000000 0 
0.2 —0.1488521 1.9027 x 10-? —0.1414969 2.6383 x 10-7 
0.4 0.2684884 3.8237 x 1073 0.2748614 1.0197 x 1077 
0.6 0.5519927 1.7798 x 1073 0.5539828 3.7700 x 1073 
0.8 0.7822857 6.0131 x 10-* 0.7830720 1.3876 x 10-3 
1.0 0.9934905 2.2845 x 1074 0.9937726 5.1050 x 1074 
h = 0.25 h = 0.25 
tj Wi ly) — wil Wi ly) — wl 
0.0 —1.0000000 0 —1.0000000 0 
0.25 0.4014315 4.37936 x 107! 0.0054557 4.1961 x 10-7 
0.5 3.4374753 3.01956 x 10° 0.4267572 8.8422 x 1073 
0.75 1.44639 x 10” 1.44639 x 10” 0.7291528 2.6706 x 1077 
1.0 Overflow 0.9940199 7.5790 x 10-4 


We have presented here only brief introduction to what the reader frequently encoun- 
tering stiff differential equations should know. For further details, consult [Ge2], [Lam], or 
[SGe]. 
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Initial-Value Problems for Ordinary Differential Equations 


EXERCISE SET 5.11 


1. 


SOO NN Oe 


10. 


11. 


Solve the following stiff initial-value problems using Euler’s method, and compare the results with 
the actual solution. 


a y=-9y, O<t<1, y(O) =e, withh=0.1; actual solution y(t) =e. 

by =—20(¢y—7)42t, O<t<1, yO) = 4, withh = 0.1; actual solution y(t) = P+4e°. 

« y = —20y+ 20sint+cost, 0O<t< 2, yO) = 1, with h = 0.25; actual solution 
y(t) = sint +e” 

dd. y'=50/y—50y, O<t<1, yO) = V2, withh = 0.1; actual solution y(t) = (1+e7!)!/2, 

Solve the following stiff initial-value problems using Euler’s method, and compare the results with 

the actual solution. 

a y'=—S5y+6e, O<t<1, y(0)=2, withh =0.1; actual solution y(t) = e~* 4+ e. 

b. y =—10y+10t+1, O<t<1, y(O) =e, withh =0.1; actual solution y(t) = e7!0*! +41. 

e y = -1I5y- tr) - 3/t4, 1<¢t <3, yd) = 0, with h = 0.25; actual solution 
y(t) = —e ls + tr. 

d. y = —20y+ 20cost—sint, O<t < 2, y(O) = 0, with h = 0.25; actual solution 
y(t) = —e~? + cost. 

Repeat Exercise 1 using the Runge-Kutta fourth-order method. 

Repeat Exercise 2 using the Runge-Kutta fourth-order method. 

Repeat Exercise 1 using the Adams fourth-order predictor-corrector method. 

Repeat Exercise 2 using the Adams fourth-order predictor-corrector method. 

Repeat Exercise 1 using the Trapezoidal Algorithm with TOL = 10~°. 

Repeat Exercise 2 using the Trapezoidal Algorithm with TOL = 10-°. 

Solve the following stiff initial-value problem using the Runge-Kutta fourth-order method with (a) 


h = 0.1 and (b) h = 0.025. 


2 2 1 
uy = 32m + O6u2 + st t+ 0<t<0.5, m0) = 3; 


3° 


1 1 1 
, = —66 133 t , 0<t<05, 0)= -. 
Uy Uy u2 3 3 Shs uz (0) 3 
Compare the results to the actual solution, 


2 2 1 1 1 2 
t) = =t+ Le — =e7!" and t) = —=t — =e! 4 Se, 
uy (t) 3 3 3 up(t) 3 3 3° 


Show that the fourth-order Runge-Kutta method, 
ky =hf (t, wi), 
ky =hf(t; +h/2,w; + ki /2), 
ky =hf(G + h/2, w; + ky/2), 
kg =hf (i +h, w; + ka), 


1 
Wit = Wit gm + 2ky + 2k3 + ka), 


when applied to the differential equation y’ = Ay, can be written in the form 


=((amas (hay? + : (hay? + : (hay* 
an 4 6 24 oni 
Discuss consistency, stability, and convergence for the Implicit Trapezoidal method 


h 
Wi = Wit 3 (f Gis1, Win) + fi,wi)), fori=0,1,...,N—1, 
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with wo = a@ applied to the differential equation 
y=f@y), a<t<b, ya=a. 
12. The Backward Euler one-step method is defined by 
Win = Withftas,wi1), for i=0,...,.N—1. 


Show that Q(hA) = 1/(1 — hd) for the Backward Euler method. 
13. Apply the Backward Euler method to the differential equations given in Exercise 1. Use Newton’s 
method to solve for w;41. 


14. Apply the Backward Euler method to the differential equations given in Exercise 2. Use Newton’s 
method to solve for w;41. 


15. a. Show that the Implicit Trapezoidal method is A-stable. 
b. Show that the Backward Euler method described in Exercise 12 is A-stable. 


| 5.12 Survey of Methods and Software 


In this chapter we have considered methods to approximate the solutions to initial-value 
problems for ordinary differential equations. We began with a discussion of the most elemen- 
tary numerical technique, Euler’s method. This procedure is not sufficiently accurate to be 
of use in applications, but it illustrates the general behavior of the more powerful techniques, 
without the accompanying algebraic difficulties. The Taylor methods were then considered 
as generalizations of Euler’s method. They were found to be accurate but cumbersome 
because of the need to determine extensive partial derivatives of the defining function of 
the differential equation. The Runge-Kutta formulas simplified the Taylor methods, without 
increasing the order of the error. To this point we had considered only one-step methods, 
techniques that use only data at the most recently computed point. 

Multistep methods are discussed in Section 5.6, where explicit methods of Adams- 
Bashforth type and implicit methods of Adams-Moulton type were considered. These cul- 
minate in predictor-corrector methods, which use an explicit method, such as an Adams- 
Bashforth, to predict the solution and then apply a corresponding implicit method, like an 
Adams-Moulton, to correct the approximation. 

Section 5.9 illustrated how these techniques can be used to solve higher-order initial- 
value problems and systems of initial-value problems. 

The more accurate adaptive methods are based on the relatively uncomplicated one-step 
and multistep techniques. In particular, we saw in Section 5.5 that the Runge-Kutta-Fehlberg 
method is a one-step procedure that seeks to select mesh spacing to keep the local error 
of the approximation under control. The Variable Step-Size Predictor-Corrector method 
presented in Section 5.7 is based on the four-step Adams-Bashforth method and three-step 
Adams-Moulton method. It also changes the step size to keep the local error within a given 
tolerance. The Extrapolation method discussed in Section 5.8 is based on a modification 
of the Midpoint method and incorporates extrapolation to maintain a desired accuracy of 
approximation. 

The final topic in the chapter concerned the difficulty that is inherent in the approxima- 
tion of the solution to a stiff equation, a differential equation whose exact solution contains 
a portion of the form e~*’, where A is a positive constant. Special caution must be taken 
with problems of this type, or the results can be overwhelmed by round-off error. 

Methods of the Runge-Kutta-Fehlberg type are generally sufficient for nonstiff prob- 
lems when moderate accuracy is required. The extrapolation procedures are recommended 
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CHAPTER 5 


Initial-Value Problems for Ordinary Differential Equations 


for nonstiff problems where high accuracy is required. Extensions of the Implicit Trape- 
zoidal method to variable-order and variable step-size implicit Adams-type methods are 
used for stiff initial-value problems. 

The IMSL Library includes two subroutines for approximating the solutions of initial- 
value problems. Each of the methods solves a system of m first-order equations in m vari- 
ables. The equations are of the form 

du; . 

— = f(t,U,,U2,...,Um), fori=1,2,...,m, 

dt 
where u; (fo) is given for each i. A variable step-size subroutine is based on the Runge-Kutta- 
Verner fifth- and sixth-order methods described in Exercise 4 of Section 5.5. A subroutine of 
Adams type is also available to be used for stiff equations based on a method of C. William 
Gear. This method uses implicit multistep methods of order up to 12 and backward differ- 
entiation formulas of order up to 5. 

Runge-Kutta-type procedures contained in the NAG Library are based on the Merson 
form of the Runge-Kutta method. A variable-order and variable step-size Adams method 
is also in the library, as well as a variable-order, variable step-size backward-difference 
method for stiff systems. Other routines incorporate the same methods but iterate until a 
component of the solution attains a given value or until a function of the solution is zero. 

The netlib library includes several subroutines for approximating the solutions of initial- 
value problems in the package ODE. One subroutine is based on the Runge-Kutta-Verner 
fifth- and sixth-order methods, another on the Runge-Kutta-Fehlberg fourth- and fifth-order 
methods as described on page 297 of Section 5.5. A subroutine for stiff ordinary differential 
equation initial-value problems, is based on a variable coefficient backward differentiation 
formula. 

Many books specialize in the numerical solution of initial-value problems. Two classics 
are by Henrici [Hel] and Gear [Gel]. Other books that survey the field are by Botha 
and Pinder [BP], Ortega and Poole [OP], Golub and Ortega [GO], Shampine [Sh], and 
Dormand [Do]. 

Two books by Hairer, Nérsett, and Warner provide comprehensive discussions on non- 
stiff [HNW1] and stiff [HNW2] problems. The book by Burrage [Bur] describes parallel 
and sequential methods for solving systems of initial-value problems. 
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Direct Methods for Solving Linear Systems 


Introduction 


Kirchhoff’s laws of electrical circuits state that both the net flow of current through each 
junction and the net voltage drop around each closed loop of a circuit are zero. Suppose 
that a potential of V volts is applied between the points A and G in the circuit and that i), iz, 
iz, i4, and is represent current flow as shown in the diagram. Using G as a reference point, 
Kirchhoff’s laws imply that the currents satisfy the following system of linear equations: 


Si, +5 = V, 
iz — ig —is = 0, 
2i4 — 3i5 = 0, 
i —-b-B =0, 
Sig — 713 — 2i4 = 0. 
A 20, B 30 C 


G 30 F 40, E 


The solution of systems of this type will be considered in this chapter. This application 
is discussed in Exercise 29 of Section 6.6. 

Linear systems of equations are associated with many problems in engineering and sci- 
ence, as well as with applications of mathematics to the social sciences and the quantitative 
study of business and economic problems. 

In this chapter we consider direct methods for solving a linear system of n equations 
in n variables. Such a system has the form 


Ey: aux + ayoxX2 +--+ + ainX, = D1, 
Ex: daxX + do2X2 +--+ + GonXn = bo, 

(6.1) 
E,, > AniX1 + Ay2X2 +°°: + AnnXn = by. 


In this system we are given the constants a;;, for each i, j = 1,2,...,n, and b;, for each 
i= 1,2,...,n, and we need to determine the unknowns x|,..., Xp. 
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Direct techniques are methods that theoretically give the exact solution to the system in 
a finite number of steps. In practice, of course, the solution obtained will be contaminated by 
the round-off error that is involved with the arithmetic being used. Analyzing the effect of 
this round-off error and determining ways to keep it under control will be a major component 
of this chapter. 

A course in linear algebra is not assumed to be prerequisite for this chapter, so we 
will include a number of the basic notions of the subject. These results will also be used 
in Chapter 7, where we consider methods of approximating the solution to linear systems 
using iterative methods. 


| 7S 6.1 Linear Systems of Equations 


We use three operations to simplify the linear system given in (6.1): 


1. Equation £; can be multiplied by any nonzero constant A with the resulting equa- 
tion used in place of E;. This operation is denoted (AE;) —> (E;). 


2. Equation £; can be multiplied by any constant A and added to equation E; with 
the resulting equation used in place of E;. This operation is denoted (E; + AE)) > 


(Ej). 

3. Equations £; and £; can be transposed in order. This operation is denoted (£;) <> 
(E}). 

By a sequence of these operations, a linear system will be systematically transformed into 


to a new linear system that is more easily solved and has the same solutions. The sequence 
of operations is illustrated in the following. 


Illustration The four equations 


Ei: x+ x2 +3x4= 4, 
Fy: 2x,+ mM - 2+ m= 1, 
E33: 3xj- x» —-— x%3+2x4 = —3, 
Ey: —x,+2%.1+333- x = 4, 


(6.2) 


will be solved for x1, x2, x3, and x4. We first use equation FE; to eliminate the unknown x; 
from equations E>, F3, and Ey by performing (FE — 2E,) > (E>), (FE; — 3E,) > (&;), and 
(E, + E,) —> (E4). For example, in the second equation 


(En — 2E)) > (Ep) 
produces 
(2x; + x2 — x3 + x4) — 2(x) +X + 3x4) = 1 — 2(4). 


which simplifies to the result shown as E> in 


E,: xy + x +3xy,= 4, 
E,: — xX -— x%3-5x= —7, 
E3: Ax x3 — 7x4 = —15, 
Ey: 3x. +343 +2x4= 8. 


For simplicity, the new equations are again labeled Ej, Fo, F3, and E4. 
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In the new system, £2 is used to eliminate the unknown x2 from £3 and E, by performing 
(E3 — 4E,) > (E3) and (E4 + 3E2) — (E4). This results in 


Ey: x) +x + 3x4= 4, 
E,: —xX-— %4- 5x,= —-7, 
2 2 3 4 (6.3) 
E3: 3x3 + 13x, = 13, 
E4 7 = 13x4 = —13. 


The system of equations (6.3) is now in triangular (or reduced) form and can be solved 
for the unknowns by a backward-substitution process. Since £4 implies x4 = 1, we can 
solve E3 for x3 to give 


1 1 
x3= ri — 13x4) = ql — 13) =0. 
Continuing, E2 gives 
Xo = —(—7+ 5x4 + %3) = —(—-7 +540) = 2, 


and E| gives 


xy =4-—3x4-x» =4-3-2=-1. 


The solution to system (6.3), and consequently to system (6.2), is therefore, x; = —1, 
x2= 2, X3 = 0, and x4 =1. 


Matrices and Vectors 


When performing the calculations in the Illustration, we would not need to write out the full 
equations at each step or to carry the variables x, x2, x3, and x4 through the calculations, if 
they always remained in the same column. The only variation from system to system occurs 
in the coefficients of the unknowns and in the values on the right side of the equations. For 
this reason, a linear system is often replaced by a matrix, which contains all the information 
about the system that is necessary to determine its solution, but in a compact form, and one 
that is easily represented in a computer. 


Definition 6.1 Ann xm (n by m) matrix is a rectangular array of elements with n rows and m columns 
in which not only is the value of an element important, but also its position in the array. m 


The notation for an n x m matrix will be a capital letter such as A for the matrix and 
lowercase letters with double subscripts, such as a;;, to refer to the entry at the intersection 
of the ith row and jth column; that is, 


a1 a12 Aim 

421 422 aA2m 
A= [ajj]= 

Gni Gn2 *°* Anm 


Example 1 Determine the size and respective entries of the matrix 
2 -1 7 
Seri 
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Solution The matrix has two rows and three columns so it is of size 2 x 3. It entries are 
described by aj; = 2, a2 = —1, a13 = 7, do, = 3, da2 = 1, and an3 = 0. a 


The 1 x n matrix 
A=[ay a2 +++ iy] 
is called an n-dimensional row vector, and ann x | matrix 


a1 


is called an n-dimensional column vector. Usually the unnecessary subscripts are omitted 
for vectors, and a boldface lowercase letter is used for notation. Thus 


xX) 
x2 


denotes a column vector, and 


y=I[y1 yo --- Yn] 


a row vector. In addition, row vectors often have commas inserted between the entries to 
make the separation clearer. So you might see y written as y = [y1, yo, ..-, Ynl. 
Ann x (n+ 1) matrix can be used to represent the linear system 


AyjX1 + ay2xX2 + +++ + AinXn = Dy, 


A211 + Ag2X2 + +++ + GonXn = bo, 


AniX, + An2X2 + +++ + AnnXn = Dn, 


by first constructing 


41 412 **: Gin by 
a2, ay A2n bo 
A= [aij] = and b= 
ani An2 ‘t+ Ann by, 
ay ay Ain by 
a2, ax A2n by 
[A,b] = , 
ani Gn2 *** Ann : by 


Augmented refers to the fact that where the vertical dotted line is used to separate the coefficients of the unknowns from the 
the right-hand side of the system Values on the right-hand side of the equations. The array [A, b] is called an augmented 
has been included in the matrix. matrix. 
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A technique similar to Gaussian 
elimination first appeared during 
the Han dynasty in China in the 
text Nine Chapters on the 
Mathematical Art, which was 
written about 200 B.C.E. Joseph 
Louis Lagrange (1736-1813) 
described a technique similar to 
this procedure in 1778 for the 
case when the value of each 
equation is 0. Gauss gave a more 
general description in Theoria 
Motus corporum coelestium 
sectionibus solem ambientium, 
which described the least squares 
technique he used in 1801 to 
determine the orbit of the minor 
planet Ceres. 
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Repeating the operations involved in Example | with the matrix notation results in first 
considering the augmented matrix: 


1 1 0 3 4 
2 1 -l 1 1 
3 -l -l 2 —3 
-1 2 3. 1 4 


i to 3 4 1 1 0. 3 4 
0 -1 -1 —5 7 0 -1 -1 —5 7 
Caf 27 15 ad }o 0 030CUB 13 
co 3 32:2 8 0 0 @ =% 226 


The final matrix can now be transformed into its corresponding linear system, and so- 
lutions for x;, x2, x3, and x4, can be obtained. The procedure is called Gaussian elimination 
with backward substitution. 

The general Gaussian elimination procedure applied to the linear system 


Ey: ayyxy + ay2X2 + +++ + AinXn = D1, 


Ey: ayyxX) +. 22%. + +++ + AanXn = bo, 


(6.4) 
Ent GniX1 + An2X2 + +++ + AnnXn = bn, 
is handled in a similar manner. First form the augmented matrix A: 
a1 412 *** Gin 1 n+1 
7 421 422, +++ dn 241 
A = [A,b] = , (6.5) 
Gnt Gn2 *** Ann Gnn+1 


where A denotes the matrix formed by the coefficients. The entries in the (7 + 1)st column 
are the values of b; that is, a;,41 = b; foreach i = 1,2,--- ,n. 
Provided a; 4 0, we perform the operations corresponding to 


(Ej — (aj /a1)E\) = (E;) for each j = 2,3,...,n 


to eliminate the coefficient of x; in each of these rows. Although the entries in rows 2, 3,...,7 
are expected to change, for ease of notation we again denote the entry in the ith row and the 
jth column by a;;. With this in mind, we follow a sequential procedure fori = 2,3,...,n—1 
and perform the operation 


(Ej = (aji/aii)E;) = (Ej) for each j =i+ 1, i+ 2, dels 


provided a;; 4 0. This eliminates (changes the coefficient to zero) x; in each row below the 
ith for all values of i = 1,2,...,n — 1. The resulting matrix has the form: 


411 412 Gin Q1n4+1 

x 0. an A2n a2.n41 
A= ce ; 

0 wee ewww ee 0) Ann Ann+1 
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where, except in the first row, the values of a;; are not expected to agree with those in the 


original matrix A. The matrix A represents a linear system with the same solution set as the 
original system. 
The new linear system is triangular, 


Ay X] + Ay2XQ + +++ + AtnXn = At n41, 


472X2 +++ + A2nXn = 42.n41; 


AnnXn = Ann+1; 
so backward substitution can be performed. Solving the nth equation for x, gives 


Anant tt 
xn = —. 
Gnn 


Solving the (m — 1)st equation for x,_; and using the known value for x, yields 


An—1,n+1 — An—1,nXn 


Xn-1 = 
An—-1,n—1 
Continuing this process, we obtain 
. nm oe . 
Gintl — GinXn — Gin-1Xn-1 — + — G44. — Gintl — pare Gi jXj 
i= = : 
Qii Gi 


for eachi=n—1,n—2,---,2,1. 
Gaussian elimination procedure is described more precisely, although more intricately, 


by forming a sequence of augmented matrices A“), A?), ..., A”, where A” is the matrix 
(k) 


A given in (6.5) and A“, for each k = 2,3,...,n, has entries a;; , where: 
(k-1) adi a 
ij when i = 1,2,...,k —l andj = 1,2,...,n+ 1, 
& 0, wheni=k,k+1,...,nandj = 1,2,--- ,k—-1, 
aij = ak) 
k-1 ik—-1_(k-1 ; ‘ 
a, — eRe Wheni=kk+1,....nandjokk+L....a41. 
k-1k-1 
Thus 
(1) (1) qd) (1) qd) (1) qd) 
0s Da Fe Gp Aye A gy 
Q) (2) (2) 2) Q 3 @) 
Q., a22" : Mog 8 Ope ge At 
qk) Ri ag. eal) (k-1) k=l) tk) 
ila : ee ake ak en Ant (6.6) 
Mh () ® + & 
0 Arg ct ky Unt 
() 2 
iy egoca tine eioeeienee: 0 a® ... a 2 g®,, 


represents the equivalent linear system for which the variable x,_; has just been eliminated 
from equations E,, Ex41,..., En. 
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The procedure will fail if one of the elements ay ‘ aS ; a, “nag Eee 


because the step 


(n) j 
ayn, 1S Zero 


(k) 
ai 
E;— —7, (Ee) | > £: 
a! ) 
kk 
either cannot be performed (this occurs if one of aia sounds ee , 1s zero), or the backward 


substitution cannot be accomplished (in the case a“ = 0). The system may still have a 
solution, but the technique for finding the solution must be altered. An illustration is given 


in the following example. 


Example 2 Represent the linear system 


Ey: x- 2 +2%3- xu = —8, 
E> : 2x1 = 2x9 + 3x3 = 3x4 = —20, 
F3:  xy+ 2+ 1 = -2, 


Ey: Xy— AX +4034 3x4 = 4, 
as an augmented matrix and use Gaussian Elimination to find its solution. 


Solution The augmented matrix is 


oe a ees 
B® 9 0 8 8 > 
— A) — : 
mins |e ee ee eee, 
a a oe 


Performing the operations 


(Ey — 2E\) > (£2), (F3 — £1) > (£3), and (Ey — £)) > (£4), 


gives 
1-1 2 -1 : -8 
~ 0 Oo -1 -1 : -4 
Q) _ : 
liad ae ae | : 6 
0 0 2 4: 12 
The pivot element for a specific The diagonal entry ce called the pivot element, is 0, so the procedure cannot continue 


column is the entry thatis used to jn its present form. But operations (£;) <> (E;) are permitted, so a search is made of 


place geronin theomerenies the elements ae and ay for the first nonzero element. Since a) # 0, the operation 


that column. : . : 
ee (E2) < (E3) is performed to obtain a new matrix, 


1 -l 2 -1 : -8 

~ (Oy 0 2 -1 1: 6 
Or. : 

ln 0 Oo -1 -1 : -4 

0 0 2 4: 12 


Since x2 is already eliminated from F3 and E4, A®) will be A, and the computations 
continue with the operation (E4 + 2E3) — (£4), giving 


1-1 2 -1 : -8 
~ 0 2-1 1: 6 
(4) _ : 
i 0 oO -1 -1 : -4 

0 0 0 2: 4 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


364 CHAPTER 6 = Direct Methods for Solving Linear Systems 


Finally, the matrix is converted back into a linear system that has a solution equivalent to 
the solution of the original system and the backward substitution is applied: 


X4 = 2 = 2, 
galt el oy 
—1 
ee [6 — x4 — (—1)xa] | 3 
= D) —s 1 
[—8 — (—1)xq — 243. — (—1) my] 
xj, = =-7. 
1 | 


Example 2 illustrates what is done if ae = 0 for some k = 1,2,...,n — 1. The kth 
column of A“~) from the kth row to the nth row is searched for the first nonzero entry. If 


ae # 0 for some p,with k + 1 < p <n, then the operation (E;) <> (E,) is performed to 


obtain A“. The procedure can then be continued to form A, and so on. If ee = 0 for 
each p, it can be shown (see Theorem 6.17 on page 398) that the linear system does not 
have a unique solution and the procedure stops. Finally, if a”) = 0, the linear system does 
not have a unique solution, and again the procedure stops. 

Algorithm 6.1 summarizes Gaussian elimination with backward substitution. The al- 
gorithm incorporates pivoting when one of the pivots a is O by interchanging the kth row 


, ‘ : ‘ (k) 
with the pth row, where p is the smallest integer greater than k for which nk 4 0. 


Gaussian Elimination with Backward Substitution 


To solve the n x n linear system 


Ey t) ayyxXy + y2X2 + +++ + AinXp = Ai n+ 


Ey: GyyXy + g2X2 + +++ + Gon Xp = A2n41 


Ent QniX1 + An2X2 +++ + AnnXn = Ann+1 
INPUT number of unknowns and equations n; augmented matrix A = [a;;], where 1 < 
ix<nandil<j<n+l. 
OUTPUT . solution x, x2,...,X, or message that the linear system has no unique solution. 
Step 1 Fori=1,...,n—1do Steps 2-4. (Elimination process.) 


Step 2 Let p be the smallest integer with i < p < nanda,; 4 0. 
If no integer p can be found 
then OUTPUT (‘no unique solution exists’); 
STOP. 


Step 3 Ifp #i then perform (E,) = (E;)). 
Step 4 Forj =i+1,...,ndo Steps 5 and 6. 
Step 5 Set mj = aji/aji. 
Step 6 Perform (Ej — mjE;) > (E;); 
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Step 7 If dyn = 0 then OUTPUT (‘no unique solution exists’); 
STOP. 


Step 8 Set x = Gnny1/Ann- (Start backward substitution.) 
Step 9 Fori=n—1,...,1setx;= [ini - ar ai] [ow 


Step 10 OUTPUT (1,...,%,); (Procedure completed successfully.) 
STOP. | 


To define matrices and perform Gaussian elimination using Maple, first access the 
LinearAlgebra library using the command 


with(LinearAlgebra) 
To define the matrix A“) of Example 2, which we will call AA, use the command 
AA = Matrix({[1, —1, 2, -l, —8], [2, =2, 3, -3, —20], [1, 1, 1, 0, —2], [1, =1; 4, 3; 4])) 


This lists the entries, by row, of the augmented matrix AA = A“. 

The function RowOperation(AA, [i, j],m) performs the operation (£; + mE;) > (E)), 
and the same command without the last parameter, that is, RowOperation(AA, [i, j]) per- 
forms the operation (E;) <> (E;). So the sequence of operations 


AAI := RowOperation(AA, [2, 1], —2) 


AA2 := RowOperation(AA1, [3, 1], —1) 
AA3 := RowOperation(AA2, [4, 1], —1) 
AA4 := RowOperation(AA3, [2, 3]) 

] 


AAS := RowOperation(AA4, [4, 3], 2) 


gives the reduction to AAS = A®, 
Gaussian Elimination is a standard routine in the LinearAlgebra package of Maple, 
and the single command 


AAS := GaussianElimination(AA) 

returns this same reduced matrix. In either case, the final operation 
x := BackwardSubstitute(AA5) 

gives the solution x which has x, = —7, x. = 3, x3 = 2, and x4 = 2. 


Illustration The purpose of this illustration is to show what can happen if Algorithm 6.1 fails. The 
computations will be done simultaneously on two linear systems: 


Xt 22+ 1 =4, Xt 22+ 1 =4, 
2x, +2x%2+ x3 = 6, and 2x, + 2%. + x13 = 4, 
xX) + %»+2x3 = 6, Xj) + %+2x3= 6. 


These systems produce the augmented matrices 


7 111:4 7 111:4 
A=|2 2 1:6 and A=|221:4 
1 12:6 1 12 :6 


Since a; = 1, we perform (Ey — 2E,) > (F2) and (£3 — E,) > (£3) to produce 


; 11 1: 4 - ft ats 4 
A=] 0 0-1 : -2 and A=]0 0 -1 : -4 
00 1: 2 00 1: 2 
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At this point, a22 = a3. = 0. The algorithm requires that the procedure be halted, and no 
solution to either system is obtained. Writing the equations for each system gives 


X+y+ 363= 4, y+x2+ B= 4, 
—x3 = —2, and —x3 = —4, 
x3 2; 3 2. 


The first linear system has an infinite number of solutions, which can be described by x3 = 2, 
Xo = 2 —x,, and x, arbitrary. 


The second system leads to the contradiction x3 = 2 and x3 = 4, so no solution exists. In 
each case, however, there is no unique solution, as we conclude from Algorithm 6.1. 


Although Algorithm 6.1 can be viewed as the construction of the augmented matrices 
A®,...,A™, the computations can be performed in a computer using only one n x (n+ 1) 
array for storage. At each step we simply replace the previous value of a;; by the new one. 
In addition, we can store the multipliers m;; in the locations of a;; because a; has the value 
0 for eachi = 1,2,...,n—landj =i+1,i+2,...,n. Thus A can be overwritten by the 
multipliers in the entries that are below the main diagonal (that is, the entries of the form aj, 
with j > 7) and by the newly computed entries of A on and above the main diagonal (the 
entries of the form a;;, with j < i). These values can be used to solve other linear systems 
involving the original matrix A, as we will see in Section 6.5. 


Operation Counts 


Both the amount of time required to complete the calculations and the subsequent round-off 
error depend on the number of floating-point arithmetic operations needed to solve a routine 
problem. In general, the amount of time required to perform a multiplication or division on a 
computer is approximately the same and is considerably greater than that required to perform 
an addition or subtraction. The actual differences in execution time, however, depend on the 
particular computing system. To demonstrate the counting operations for a given method, 
we will count the operations required to solve a typical linear system of n equations in 
n unknowns using Algorithm 6.1. We will keep the count of the additions/subtractions 
separate from the count of the multiplications/divisions because of the time differential. 

No arithmetic operations are performed until Steps 5 and 6 in the algorithm. Step 
5 requires that (n — i) divisions be performed. The replacement of the equation Ej by 
(E; — mE;) in Step 6 requires that mj; be multiplied by each term in £;, resulting in a total 
of (n — i)(n —i+ 1) multiplications. After this is completed, each term of the resulting 
equation is subtracted from the corresponding term in £;. This requires (n — i)(n —i+ 1) 
subtractions. For each i = 1,2,..., — 1, the operations required in Steps 5 and 6 are as 
follows. 


Multiplications/divisions 


(n-D)+M-)DM-i+ 1 =(—-I)M-i+2). 
Additions/subtractions 
(n—i)(n—it+1). 


The total number of operations required by Steps 5 and 6 is obtained by summing the 
operation counts for each i. Recalling from calculus that 
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ts “mm +1) “2 mm+ 1)(2m+4+ 1) 
1 = => d = 
2 ae at ne 
j=l j=l j=l 
we have the following operation counts. 
Multiplications/divisions 
n—1 n—1 
Yo =D i+ 2) = YW? = wi +? + 2n- 21) 
i=l i=l 
n—-1 n—1 n—-1 
-Sa-942F0-9- 2 nem 
i=l i=l i=l 
_a- 1)n(2n — 1) we 5 —1)n 7 2n? + 3n? —5n 
~ 6 20 6 


Additions/subtractions 


n—1 n—1 
Yia-dDn-i+ 1) = Yio? —2ni+? tn- i) 
i=1 i=1 


n—-1 n—1 


= Sino +o Se > 


(n—1)nQn—1) (n—1)n wW-n 


6 2 3 


The only other steps in Algorithm 6.1 that involve arithmetic operations are those 
required for backward substitution, Steps 8 and 9. Step 8 requires one division. Step 9 
requires (n — i) multiplications and (n — i — 1) additions for each summation term and 
then one subtraction and one division. The total number of operations in Steps 8 and 9 is 
as follows. 


Multiplications/divisions 


n-l n—1 
1+ )(n-)+D)=14+( @-|))t+n-1 
i=l i=1 


n—-1 


Lane Yodan yi) =v 


Additions/subtractions 


n—1 n—-1 n2 


urine y= Fo 9= Fins — 


i=1 


The total number of arithmetic operations in Algorithm 6.1 is, therefore: 
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Multiplications/divisions 


Qn? +3n2—5n n2+n ne > nN 


6 a i 
Additions/subtractions 
mon mon _wim 5n 
3 2 3 2 6 


For large n, the total number of multiplications and divisions is approximately n° /3, 
as is the total number of additions and subtractions. Thus the amount of computation and 
the time required increases with n in proportion to n°, as shown in Table 6.1. 


Table 6.1 n Multiplications/Divisions Additions/Subtractions 
3 17 11 
10 430 375 
50 44,150 42,875 
100 343,300 338,250 


EXERCISE SET 6.1 


1. For each of the following linear systems, obtain a solution by graphical methods, if possible. Explain 
the results from a geometrical standpoint. 


a x, +2x% = 3, b. xX, + 2x) = 3, c. xX, + 2x. = 0, dad. 2x,+ »=-l, 
Xy— xX =0. 2x, + 4x) = 6. 2x, + 4x. = 0. 4x, + 2x2 = —2, 

Xx, — 3x. =5. 

2. For each of the following linear systems, obtain a solution by graphical methods, if possible. Explain 
the results from a geometrical standpoint. 


a. xX, + 2x. = 0, b. X, + 2x) = 3, Cc. 2x, + %=-l, d. 2x; + y+2x3= 1, 


PS ee xX = 0. —2x; — 4x = 6. xi + x2 = 2, 2x +4. -13=- 1. 


xa 3x2 = 5: 
3. Use Gaussian elimination with backward substitution and two-digit rounding arithmetic to solve 
the following linear systems. Do not reorder the equations. (The exact solution to each system is 


x) =1,x% = -1,x3 = 3.) 

a 4x, -— m+ x3 = 8, b. 4x, + x2 + 2x3 = 9, 
2x; + 5x2 + 2x3 = 3, 2x, + 4x) — 33 = —5, 
XxX; + 2x. + 4x3 = 11. Xp + x2 — 3x3 = —9. 


4. Use Gaussian elimination with backward substitution and two-digit rounding arithmetic to solve 
the following linear systems. Do not reorder the equations. (The exact solution to each system is 


x,= —1,x = 1, x3 = 3.) 

a —x,+ 40+ 4% =8, b. 4x, + 2x). -— x3 =—5S, 
Ss + tut in 1, bit pun beso, 
2x, + XxX + 4x3 = 11. X, + 4x + 2x3 = 9. 
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5. Use the Gaussian Elimination Algorithm to solve the following linear systems, if possible, and deter- 
mine whether row interchanges are necessary: 


a. X,— %2+3x3 = 2, b. 2x, — 1.5x. + 3x3 = 1, 
3x, —3m + 63 =, —x| + 2x3 = 3, 
Xp + XxX = 3. 4x, — 4.5x) + 533 = I. 
ce. 2x, = 3. d. xX, + x2 + x =2, 
x, + 1.5x2 = 45, 2x, +42 - 144+ m= 1, 
— 3x, + 0.5x3 = —6.6, 4x, — xX) — 2x3 + 2x4 = 0, 
2x,- 2+ 23%+2%4,= 0.8. 3x, —X2— 34+ 2x4 = —3. 


6. Use the Gaussian Elimination Algorithm to solve the following linear systems, if possible, and deter- 


mine whether row interchanges are necessary: 
1 


a. X. — 2x3 = 4, b. X1— 5X2 + x3 = 4, 
X1}-%»+ x3= 6, 2x, - xX — 2w+%x%=5, 
xy — %=2. Xt xX + 5x3 =2, 


x — 5x + x34+%4,=5. 


c. 2x, —x9+%3—X4 = 6, d. Xj + X + x4 =2, 
X2—x3+x4 = 5, 2x, + m- 6Bm+ m4 = 1, 
oP —xX, +2x%,4+ 343 - x4 = 4, 
X3—-X4 = 3. 3x, — x2 — 43 4+2x4 = 3. 
7. Use Algorithm 6.1 and Maple with Digits:= 10 to solve the following linear systems. 
a Gk t+ 3x) + 73 =9, b. 3.333x, + 15920x2 — 10.333x3 = 15913, 
suit 5X2 + 3x3 = 8, 2.222x, + 16.71%. + 9.612x3 = 28.544, 
5X1 + X2 + 2x3 = 8. 1.561 1x; + 5.179 1x2 + 1.6852x3 = 8.4254. 
c. x + $x + 443 + tual, dad. 2x, + x -— x3+4%4 — 3x5 = 7, 
Sx + tx + x3 + bry=t, x + 2x3 -—x4+ x5 = 2, 
3x + $x) + $3 4+ trai, — 2x. -— x3 4+x%4-— Xs = —S, 
quit 2X9 + 5x3 + txt. 3x, + x — 443 + 5x5 = 6, 
Xp— X.— X3— Xt X%5 = 3. 


8. Use Algorithm 6.1 and Maple with Digits:= 10 to solve the following linear systems. 


a 5x + 5m— fx; =0, b.  2.71x; + x + 1032x3 = 12, 
fm — gmt 3 =1, 4.12x,;— x. + 500x3 = 11.49, 
5x1 + 5x2 + 7x3 = 2. 3.33x1 + 2x. — 200x3 = 41. 

c. mx, + V2x2 — x3+ x4= 0, d. Xp+ x -— 3+ X4- Xs =2, 
exy— Xt + OD = I, 2x1 + 2x2 + x3—- X4+ X5 = 4, 
xt xX — V3x3 + X4=2, 3x, + x2 — 3x3 — 2x4 + 3x5 = 8, 

—x- et tg — V5 4 = 3. 4x; + x2 — x3 + 4x4 — 5x5 = 16, 


16x, — Xyg+ X3—-— Xy— xe. = 32: 


9. Given the linear system 


2x, — 6ax2 = 3, 


iw 


3ax,- m= 


a. _ Find value(s) of a for which the system has no solutions. 
Find value(s) of a for which the system has an infinite number of solutions. 


Assuming a unique solution exists for a given @, find the solution. 
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10. Given the linear system 
Xi — % +axz = —2, 
—x,) + 2x2 — ax3 = 3, 
ax, + M+ x =2. 
a. _ Find value(s) of a for which the system has no solutions. 
b. Find value(s) of a for which the system has an infinite number of solutions. 
c. Assuming a unique solution exists for a given q, find the solution. 

11. Show that the operations 
a. (AE;) > (E)) b.  (E) +4E)) > (Ei) ce (E) > &) 
do not change the solution set of a linear system. 

12. Gauss-Jordan Method: This method is described as follows. Use the ith equation to eliminate not 
only x; from the equations Ej), Ei42,..., En, as was done in the Gaussian elimination method, but 
also from FE, Ey,...,£;_,;. Upon reducing [A, b] to: 

d) : (1) 

ayy Qe ens 0 » Ant 

0 af) a 

‘ 7 ‘ 0 : 

O = 0 a) : a®,, 
the solution is obtained by setting 

(i) 
Pe Gi n+l 
i a® > 

for each i = 1,2,...,n. This procedure circumvents the backward substitution in the Gauss- 
ian elimination. Construct an algorithm for the Gauss-Jordan procedure patterned after that of 
Algorithm 6.1. 

13. Use the Gauss-Jordan method and two-digit rounding arithmetic to solve the systems in Exercise 3. 

14. Repeat Exercise 7 using the Gauss-Jordan method. 

15. a. Show that the Gauss-Jordan method requires 

moo, A ree Sa 
= +n°— 5 multiplications/divisions 
and 
we 
ae additions/subtractions. 
b. Make a table comparing the required operations for the Gauss-Jordan and Gaussian elimination 
methods for n = 3, 10,50, 100. Which method requires less computation? 
16. Consider the following Gaussian-elimination-Gauss-Jordan hybrid method for solving the system 


(6.4). First, apply the Gaussian-elimination technique to reduce the system to triangular form. Then 
use the nth equation to eliminate the coefficients of x, in each of the first n — 1 rows. After this is 
completed use the (n — 1)st equation to eliminate the coefficients of x,_; in the first n — 2 rows, etc. 
The system will eventually appear as the reduced system in Exercise 12. 


a. Show that this method requires 


vr 3 ys te ote es 
>—+-<=n°—-=n multiplications/divisions 
3 2 6 
and 
wmon 5 . 
— +—-— =n additions/subtractions. 
3 2 6 
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b. Make a table comparing the required operations for the Gaussian elimination, Gauss-Jordan, 
and hybrid methods, for n = 3, 10,50, 100. 


17. Use the hybrid method described in Exercise 16 and two-digit rounding arithmetic to solve the systems 
in Exercise 3. 


18. Repeat Exercise 7 using the method described in Exercise 16. 


19. Suppose that in a biological system there are n species of animals and m sources of food. Let x; 
represent the population of the jth species, for each j = 1,--- ,n; b; represent the available daily 
supply of the ith food; and a;; represent the amount of the ith food consumed on the average by a 
member of the jth species. The linear system 


AX, + AyQXy + +++ + AyyXy = D1, 


A21X1 + Ar2X2 ++++ + ArnXn = bo, 


Ami X1 + Am2*2 Sp te GAmnXn = Dn 


represents an equilibrium where there is a daily supply of food to precisely meet the average daily 
consumption of each species. 


a. Let 


eNO 
ie) 


1 2 
A= [aij] => 1 0 
0 0 


x = (x) = [1000, 500, 350, 400], and b = (;) = [3500, 2700, 900]. Is there sufficient food 
to satisfy the average daily consumption? 


b. What is the maximum number of animals of each species that could be individually added to the 
system with the supply of food still meeting the consumption? 


c. If species 1 became extinct, how much of an individual increase of each of the remaining 
species could be supported? 


d. If species 2 became extinct, how much of an individual increase of each of the remaining 
species could be supported? 


20. A Fredholm integral equation of the second kind is an equation of the form 


b 
u(x) = f(x) +f K(x, tue) dt, 


where a and b and the functions f and K are given. To approximate the function u on the interval 
[a, b], a partition x9 = a < xX) <+++ <Xm—1 < Xm = bis selected and the equations 


b 
u(x) = f i) +f K(x, t)u(t) dt, foreachi=0,--- ,m, 


are solved for u(xo), u(x1),--- , Um). The integrals are approximated using quadrature formulas 
based on the nodes xo,--- ,xX,,. In our problem, a = 0,b = 1, f(x) = x’, and K(x, t) =e", 


a. Show that the linear system 
1 
uO) = fO) + 5 [KO, 0)u(0) + KO, 1)uQ)), 


u(1) = fd) + SKA, 0u(0) + K(1, Dud) 


must be solved when the Trapezoidal rule is used. 


b. Set up and solve the linear system that results when the Composite Trapezoidal rule is used with 
n=4, 


c. Repeat part (b) using the Composite Simpson’s rule. 
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| Sa 6.2 Pivoting Strategies 


In deriving Algorithm 6.1, we found that a row interchange was needed when one of the 
pivot elements ay is 0. This row interchange has the form (E;) <> (E,), where p is the 


smallest integer greater than k with an # 0. To reduce round-off error, it is often necessary 
to perform row interchanges even when the pivot elements are not zero. 
If a, is small in magnitude compared to ae then the magnitude of the multiplier 


will be much larger than 1. Round-off error introduced in the computation of one of the 
(bk); Tons ; (k+1) ¢ oe 
terms a,, is multiplied by mj, when computing ays which compounds the original error. 


Also, when performing the backward substitution for 


(k) n (k) 
Gn paar ayy 
X= 7 ; 

kk 


with a small value of ay , any error in the numerator can be dramatically increased because 


of the division by a. In our next example, we will see that even for small systems, round-off 
error can dominate the calculations. 


Example 1 Apply Gaussian elimination to the system 


E,: 0.003000x; + 59.14x. = 59.17 

Ey: 5.291x; — 6.130x2 = 46.78, 
using four-digit arithmetic with rounding, and compare the results to the exact solution 
xy 10.00 and x2= 1.000. 
Solution The first pivot element, iy = 0.003000, is small, and its associated multiplier, 
Sel 
~ 0.003000 


rounds to the large number 1764. Performing (EF, — m2,E,) — (E>) and the appropriate 
rounding gives the system 


mM, = 1763.66, 


0.003000x; + 59.14x2 ~ 59.17 
—104300x2. ~ —104400, 
instead of the exact system, which is 
0.003000x; + 59.14x2 = 59.17 
—104309.376x. = —104309.376. 


The disparity in the magnitudes of mm,a)3 and a3 has introduced round-off error, but the 
round-off error has not yet been propagated. Backward substitution yields 


x2 © 1.001, 


which is a close approximation to the actual value, x. = 1.000. However, because of the 
small pivot aj; = 0.003000, 
7 59.17 — (59.14)(1.001) _ 


. i: 
= 0.003000 oe 
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contains the small error of 0.001 multiplied by 


59.14 
0.003000 
This ruins the approximation to the actual value x; = 10.00. 
This is clearly a contrived example and the graph in Figure 6.1. shows why the error 
can so easily occur. For larger systems it is much more difficult to predict in advance when 
devastating round-off error might occur. a 


~ 20000. 


Figure 6.1 


Approximation 
(—10, 1.001) Exact solution 


(10, 1) 


Partial Pivoting 
Example | shows how difficulties can arise when the pivot element ae is small relative to 
the entries a fork <i<nandk <j <n. To avoid this problem, pivoting is performed 


by selecting an element a with a larger magnitude as the pivot, and interchanging the 
kth and pth rows. This can be followed by the interchange of the kth and gth columns, if 
necessary. 

The simplest strategy is to select an element in the same column that is below the 
diagonal and has the largest absolute value; specifically, we determine the smallest p > k 
such that 


(k) (k) 
a, | = max |a; 
| pk | k<i<n | ik | 
and perform (E;,) <> (E,). In this case no interchange of columns is used. 


Example 2 Apply Gaussian elimination to the system 


E,: 0.003000x; + 59.14x2 = 59.17 
Ey: 5.291x; — 6.130x2 = 46.78, 


using partial pivoting and four-digit arithmetic with rounding, and compare the results to 
the exact solution x; = 10.00 and x2 = 1.000. 


Solution The partial-pivoting procedure first requires finding 
max {lai \as? | = max {|0.003000], |5.291]} = [5.291 = ja? |. 
This requires that the operation (E2) <> (£1) be performed to produce the equivalent system 


Ey: 5.291x, — 6.130x, = 46.78, 
Ey: 0.003000x; + 59.14x2 = 59.17. 
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The multiplier for this system is 


and the operation (E — m,E)) — (E>) reduces the system to 


5.291x, — 6.130x2 © 46.78, 
59.14x5 * 59.14. 


The four-digit answers resulting from the backward substitution are the correct values 
x; = 10.00 and x2. = 1.000. a 


The technique just described is called partial pivoting (or maximal column pivoting) 
and is detailed in Algorithm 6.2. The actual row interchanging is simulated in the algorithm 
by interchanging the values of NROW in Step 5. 


Gaussian Elimination with Partial Pivoting 
To solve the n x n linear system 


Ey t) ayyxXy + y2%2 + +++ + AinXn = Aint 


Ey: yx + 2X2 + +++ + ArpXp = Ar n41 


Ent niX, + AngX2 +++ + AnnXn = Ann 
INPUT number of unknowns and equations n; augmented matrix A = [a;;] where 1 < 
ix<nandi<j<n”+l. 
OUTPUT . solution x;,...,x, or message that the linear system has no unique solution. 
Step 1 Fori=1,...,nset NROW(i) =i. (Initialize row pointer.) 
Step 2 Fori=1,...,n—1do Steps 3-6. (Elimination process.) 


Step 3 Let p be the smallest integer with i < p < n and 
|a(NROW(p), i)| = maxj<j<n |a(NROW()), i)|- 
(Notation: a(NROW(i), j) = ayrow;,j-) 


Step 4 If a(NROW(p), i) = 0 then OUTPUT (‘no unique solution exists’); 
STOP. 
Step 5 If NROW(i) 4 NROW(p) then set NCOPY = NROW(i); 
NROW(i) = NROW(p); 
NROW(p) = NCOPY. 
(Simulated row interchange.) 


Step 6 Forj=i+1,...,ndo Steps 7 and 8. 
Step 7 Set m(NROW(;), i) = a(NROW(/)), i)/a(NROW(i), i). 


Step 8 Perform (Eyrow:j;) — m(NROW()), 1) - Ewrowi) > (Enrowc))- 


Step 9 If a(NROW(n),n) = 0 then OUTPUT (‘no unique solution exists’); 
STOP. 
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Step 10 Set x, = a(NROW(n),n + 1)/a(NROW(n), n). 
(Start backward substitution.) 


Step 11 Fori=n—1,...,1 


a(NROW(i),n + 1) — 3, a(NROW(), j) «5 


set xj = a 
a(NROW(i), i) 
Step 12. OUTPUT (x1,...,%,); (Procedure completed successfully.) 
STOP. = 


Each multiplier m; in the partial pivoting algorithm has magnitude less than or equal 
to 1. Although this strategy is sufficient for many linear systems, situations do arise when 
it is inadequate. 


Illustration The linear system 


E,: 30.00x; + 591400x. = 591700, 

Ey: 5.291x,;— 6.130x2 = 46.78, 
is the same as that in Examples 1 and 2 except that all the entries in the first equation have 
been multiplied by 10*. The partial pivoting procedure described in Algorithm 6.2 with 


four-digit arithmetic leads to the same results as obtained in Example 1. The maximal value 
in the first column is 30.00, and the multiplier 


5.291 


= —— = 0.1764 
30.00 


™?2\ 


leads to the system 


30.00x; + 591400x2 ~ 591700, 
—104300x2 ~ — 104400, 


which has the same inaccurate solutions as in Example 1: x2 © 1.001 and x; ~ —10.00. 


Scaled Partial Pivoting 


Scaled partial pivoting (or scaled-column pivoting) is needed for the system in the Illus- 
tration. It places the element in the pivot position that is largest relative to the entries in its 
row. The first step in this procedure is to define a scale factor s; for each row as 
Ss; = max lai jl. 
l<j<n 
If we have s; = 0 for some i, then the system has no unique solution since all entries in the 


ith row are 0. Assuming that this is not the case, the appropriate row interchange to place 
zeros in the first column is determined by choosing the least integer p with 


lap1| lagi | 
P= max 


Sp l<k<n Sx 


and performing (£,) < (E,). The effect of scaling is to ensure that the largest element 
in each row has a relative magnitude of 1 before the comparison for row interchange is 
performed. 
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Illustration 


Example 3 


Direct Methods for Solving Linear Systems 


In a similar manner, before eliminating the variable x; using the operations 
E,-—myE;, fork =i+1,...,n, 
we Select the smallest integer p > i with 


|api |axi| 
—— = max — 
Sp isksn Sx 


and perform the row interchange (£;) <> (E,) if i # p. The scale factors s),...,5, are 
computed only once, at the start of the procedure. They are row dependent, so they must 
also be interchanged when row interchanges are performed. 


Applying scaled partial pivoting to the previous Illustration gives 


51 = max{|30.00], |591400]} = 591400 


and 
s> = max{|5.291], |-6.130|} = 6.130. 
Consequently 
lai| 30.00 r lay| 5.291 
oe =(a0i0. eet, 
s 591400 . % 64130 


and the interchange (E) < (E2) is made. 


Applying Gaussian elimination to the new system 
5.291x, — 6.130x2 = 46.78 
30.00x; + 591400x. = 591700 


produces the correct results: x; = 10.00 and x2 = 1.000. 


Algorithm 6.3 implements scaled partial pivoting. 


Gaussian Elimination with Scaled Partial Pivoting 
The only steps in this algorithm that differ from those of Algorithm 6.2 are: 


Step 71 Fori=1,...,n sets; = maxj<j<n lal; 
if 5; = 0 then OUTPUT (‘no unique solution exists’); 
STOP. 
set NROW(i) = i. 


Step 2 Fori=1,...,n— 1 do Steps 3-6. (Elimination process.) 
Step 3 Let p be the smallest integer with i < p <n and 
|aNROW(p), 1) _ eee |a(NROW()), 0) | 
s(NROW(p)) issn s(NROW(j)) © - 


The next example demonstrates using Maple and the LinearAlgebra library to perform 
scaled partial pivoting with finite-digit rounding arithmetic. 


Solve the linear system using three-digit rounding arithmetic in Maple with the Linear- 
Algebra library. 
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2.11x, — 4.21x.+0.921x;= 2.01, 
4.01x;+ 10.2x2 — 1.12x3 = —3.09, 
1.09x; + 0.987x2 + 0.832x3 = 4.21. 


Solution To obtain three-digit rounding arithmetic, enter 


Digits := 3 
We have sj = 4.21, 52 = 10.2, and s3 = 1.09. So 
2.11 4.01 1.09 
il 2 geo, lS" ge a Se Ag 
3 421 % 102 m3 1.09 


Next we load the LinearAlgebra library. 
with(LinearAlgebra) 
The augmented matrix AA is defined by 


AA := Matrix({{2.11, —4.21, 0.921, 2.01], [4.01, 10.2, —1.12, —3.09], 
[1.09, 0.987, 0.832, 4.21]]) 


which gives 


241 S42) 921 2.01 
4.01 10.2 =[12 —3.09 
1.09 .987 832 4.21 


Since |a3;|/s3 is largest, we perform (Z|) < (£3) using 
AAI := RowOperation(AA, [1, 3]) 
to obtain 


1.09 .987 832 9 4.21 


4.01 10.2 =L12. =3.09 
2.11 —4.21 921 2.01 
Compute the multipliers 
AA1[2, 1 AA1[3, 1 
m21 := esd ocael y i= fees) 
1[1, 1] AAI[1, 1] 
giving 
3.68 
1.94 


Perform the first two eliminations using 


AA2 := RowOperation(AA1, [2, 1], —m21): AA3 := RowOperation(AA2, [3, 1], —m31) 


to produce 
1.09 987 .832 4.21 
0 6.57 —4.18  —18.6 
0) —6.12 —.689 —6.16 
Since 
57 2 
aaa) 2087 = hea: gag tl OE = ge. 
SQ 10.2 S83 4.21 
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we perform 
AA4 := RowOperation(AA3, [2, 3]) 
giving 


1.09 .987 832 4.21 


0 =6.12 —.689 —6.16 
0 6.57 —4.18  —18.6 
The multiplier 732 is computed by 
AA4[3, 2 
m32 := eee) 
4[2, 2] 


—1.07 


and the elimination step 
AAS := RowOperation(AA4, [3, 2], —m32) 
results in the matrix 


1.09 DBT 832 4.21 
0 —6.12 —.689 —6.16 
0 02 =4:92. =253.2 


We cannot use BackwardSubstitute on this matrix because of the entry .02 in the last row of 
the second column, that is, which Maple knows as the (3, 2) position. This entry is nonzero 
due to rounding, but we can remedy this minor problem setting it to 0 with the command 


AAS[3, 2] := 0 

You can verify this is correct with the command evalm(AA5) 

Finally, backward substitution gives the solution x, which to 3 decimal digits is x; = —0.436, 
X2 = 0.430, and x3 = 5.12. a 


The first additional computations required for scaled partial pivoting result from the 
determination of the scale factors; there are (n — 1) comparisons for each of the n rows, for 
a total of 


n(n — 1) comparisons. 


To determine the correct first interchange, n divisions are performed, followed by n— 1 
comparisons. So the first interchange determination adds 


n divisions and (n — 1) comparisons. 
The scaling factors are computed only once, so the second step requires 
(n — 1) divisions and (n — 2) comparisons. 


We proceed in a similar manner until there are zeros below the main diagonal in all but 
the nth row. The final step requires that we perform 


2 divisions and 1 comparison. 


As a consequence, scaled partial pivoting adds a total of 


n—-1 
-—1 3 
nia —1)+ yok =n(n—1)+ Ma 5 ie — ann 1) comparisons (6.7) 
k=l 
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and 


{ i 
yik=(ok p= "ar" 1 = s(n 1)(n +2) divisions 


to the Gaussian elimination procedure. The time required to perform a comparison is 
about the same as an addition/subtraction. Since the total time to perform the basic Gauss- 
ian elimination procedure is O(n>/3) multiplications/divisions and O(n?/3) additions/ 
subtractions, scaled partial pivoting does not add significantly to the computational time 
required to solve a system for large values of n. 

To emphasize the importance of choosing the scale factors only once, consider the 
amount of additional computation that would be required if the procedure were modified 
so that new scale factors were determined each time a row interchange decision was to be 
made. In this case, the term n(n — 1) in Eq. (6.7) would be replaced by 


n 1 . 
Yokk-l)= qn — 1). 


k=2 


As a consequence, this pivoting technique would add O(n*/3) comparisons, in addition to 
the [n(n + 1)/2] — 1 divisions. 


Complete Pivoting 


Pivoting can incorporate the interchange of both rows and columns. Complete (or maximal) 
pivoting at the kth step searches all the entries a;j, fori = k,k + 1,...,n andj = k, 
k+1,...,n, to find the entry with the largest magnitude. Both row and column interchanges 
are performed to bring this entry to the pivot position. The first step of total pivoting requires 
that n? — 1 comparisons be performed, the second step requires (n — 1)* — 1 comparisons, 
and so on. The total additional time required to incorporate complete pivoting into Gaussian 
elimination is 
n 


ye = n(n — ven +5) 


k=2 


comparisons. Complete pivoting is, consequently, the strategy recommended only for sys- 
tems where accuracy is essential and the amount of execution time needed for this method 
can be justified. 


EXERCISE SET 62 


1. Find the row interchanges that are required to solve the following linear systems using 
Algorithm 6.1. 


a. x—-S5xy+ B=7, b. X+$H- B= 1, 
10x; + 20x; = 6, xX, +x + 43 = 2, 
5x, -— »%=4. 2x) — X_2 + 2x3 = 3. 
c. 2x, — 3x2 + 2x3 =5, d. x2 +x%3 = 6, 
—4x, + 2x. — 6x3 = 14, xX, — 2x) — x3 = 4, 
2x, + 2x2 + 4x3 = 8. Xy— xX +43 =5. 
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~~ ena we SY 
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Direct Methods for Solving Linear Systems 


Find the row interchanges that are required to solve the following linear systems using Algorithm 6.1. 


a 13x, +17m%4+ »%=5, b. Xp+t 2% — 23 =O0, 
xX. + 19x; = 1, 12x, — x3 = 4, 
12y%- +x=0. 2x; + xy +43=5. 
ce 5x, + x — 6x3 =7, d. Xy—- % +23 =5, 
2x,+ mM—-— 4% = 8, 7x; + 5x2 — x3 = 8, 
6x, + 12%+ 13 = 9. 2x, + x. +%3=7. 


Repeat Exercise 1 using Algorithm 6.2. 
Repeat Exercise 2 using Algorithm 6.2. 
Repeat Exercise 1 using Algorithm 6.3. 
Repeat Exercise 2 using Algorithm 6.3. 
Repeat Exercise 1 using complete pivoting. 
Repeat Exercise 2 using complete pivoting. 


Use Gaussian elimination and three-digit chopping arithmetic to solve the following linear systems, 
and compare the approximations to the actual solution. 


a. 0.03x, + 58.9x, = 59.2, b. 3.03x, — 12.1x. + 14x3 = —119, 
5.31x,; — 6.10x. = 47.0. —3.03x, + 12.1%. — 7x3 = 120, 
Actual solution [10, 1]. 6.1 1x, — 14.2x. + 2143 = —139. 


Actual solution [0, 10, +). 


ce = 1.19x,+ 2.11%. — 100x3 + x4 = 1.12, 

14.2x, — 0.122x, + 12.2x3 — x4 = 3.44, 

100x2 — 99.9x3 + x4 = 2.15, 

15.3x,; + 0.110x. — 13.1%; — x4 = 4.16. 

Actual solution [0.176, 0.0126, —0.0206, —1.18]. 
d. WX — eX) + 2x3 — V3x4 = VII, 

Wx, + eX) — e7x3+ 3x4 = 0, 

V5xq- V6. + 43 — V 2x = 7, 

Wx, + ex) = 7x3 + 5X4 = /2. 

Actual solution [0.788, —3.12, 0.167, 4.55]. 


Use Gaussian elimination and three-digit chopping arithmetic to solve the following linear systems, 
and compare the approximations to the actual solution. 


a. 58.9x; + 0.03x2 = 59.2, b. 3.3330x; + 15920x2 + 10.333x3 = 7953, 
—6.10x; + 5.31x2 = 47.0. 2.2220x; + 16.710x2 + 9.6120x3 = 0.965, 
Actual solution [1, 10]. —1.5611x; + 5.1792x. — 1.6855x3 = 2.714. 


Actual solution [1, 0.5, —1]. 
c. 2.12x, — 2.12x. +51.343 + 100x4 = 7, 


0.333x, — 0.333x2 — 12.2x3 + 19.7x4 = V2, 
6.19x; + 8.20x2 — 1.00x3 — 2.01x, = 0, 
—5.73x,; + 6.12x. + x3 x=. 
Actual solution [0.0998, —0.0683, —0.0363, 0.0465]. 
doo am tV2xm—-— w+ x4 =0, 
exy— Xt Rt OAK = 1, 
mt m—V3x3+ x4 =2, 
—X,- yt RZ V5x4 = 3. 
Actual solution [1.35, —4.68, —4.03, — 1.66]. 
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11. Repeat Exercise 9 using three-digit rounding arithmetic. 

12. Repeat Exercise 10 using three-digit rounding arithmetic. 

13. Repeat Exercise 9 using Gaussian elimination with partial pivoting. 

14. Repeat Exercise 10 using Gaussian elimination with partial pivoting. 

15. Repeat Exercise 9 using Gaussian elimination with partial pivoting and three-digit rounding arithmetic. 


16. Repeat Exercise 10 using Gaussian elimination with partial pivoting and three-digit rounding arith- 
metic. 


17. Repeat Exercise 9 using Gaussian elimination with scaled partial pivoting. 
18. Repeat Exercise 10 using Gaussian elimination with scaled partial pivoting. 


19. Repeat Exercise 9 using Gaussian elimination with scaled partial pivoting and three-digit rounding 
arithmetic. 


20. Repeat Exercise 10 using Gaussian elimination with scaled partial pivoting and three-digit rounding 
arithmetic. 


21. Repeat Exercise 9 using Algorithm 6.1 in Maple with Digits:= 10. 

22. Repeat Exercise 10 using Algorithm 6.1 in Maple with Digits:= 10. 
23. Repeat Exercise 9 using Algorithm 6.2 in Maple with Digits:= 10. 

24. Repeat Exercise 10 using Algorithm 6.2 in Maple with Digits:= 10. 
25. Repeat Exercise 9 using Algorithm 6.3 in Maple with Digits:= 10. 

26. Repeat Exercise 10 using Algorithm 6.3 in Maple with Digits:= 10. 
27. Repeat Exercise 9 using Gaussian elimination with complete pivoting. 
28. Repeat Exercise 10 using Gaussian elimination with complete pivoting. 


29. Repeat Exercise 9 using Gaussian elimination with complete pivoting and three-digit rounding arith- 
metic. 


30. Repeat Exercise 10 using Gaussian elimination with complete pivoting and three-digit rounding 
arithmetic. 


31. Suppose that 


2x +X. + 3x3 = 1, 
4x, + 6x2 + 8x3 = 5, 
6x; + ax, + 10x3 = 5, 


with |w| < 10. For which of the following values of a will there be no row interchange required when 
solving this system using scaled partial pivoting? 
a a=6 b a=9 ce a=-3 

32. Construct an algorithm for the complete pivoting procedure discussed in the text. 

33. Use the complete pivoting algorithm to repeat Exercise 9 Maple with Digits:= 10. 

34. Use the complete pivoting algorithm to repeat Exercise 10 Maple with Digits:= 10. 


| a 6.3 Linear Algebra and Matrix Inversion 


Matrices were introduced in Section 6.1 as a convenient method for expressing and manip- 
ulating linear systems. In this section we consider some algebra associated with matrices 
and show how it can be used to solve problems involving linear systems. 


Definition 6.2 Two matrices A and B are equal if they have the same number of rows and columns, say 
n xX m, and if a;; = b;;, for eachi = 1,2,...,n andj = 1,2,...,m. i 
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Definition 6.4 


Example 1 
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This definition means, for example, that 


| es | 
wn 
=e 
on 
| a | 
YK 
| 
— 
— 


because they differ in dimension. 


Matrix Arithmetic 


Two important operations performed on matrices are the sum of two matrices and the 
multiplication of a matrix by a real number. 


If A and B are both n x m matrices, then the sum of A and B, denoted A + B, is the n x m 
matrix whose entries are a;; + b;;, for eachi = 1,2,...,n andj = 1,2,...,m. | 


If A is an nm X m matrix and 4d is a real number, then the scalar multiplication of 1 and 
A, denoted 1A, is the n x m matrix whose entries are Xa;;, for each i = 1,2,...,n and 


PS 1225.05 o 
Determine A + B and AA when 
a=[3 = al B=| 4 ; “a and A = —2. 
Solution We have 
oe ee | 
and 
“=| 3) 20 20 |=[=s 3 ~'o | ; 


We have the following general properties for matrix addition and scalar multiplication. 
These properties are sufficient to classify the set of all n x m matrices with real entries as 
a vector space over the field of real numbers. 


e@ We let O denote a matrix all of whose entries are 0 and —A denote the matrix whose 
entries are —aj;. 


Let A, B, and C be n x m matrices and A and yz be real numbers. The following properties 
of addition and scalar multiplication hold: 


(i) A+B=B+4A, Gi) (A+B)+C=A+(B4+0O), 
ji) A+O=O0+A=A, (iv) A+(—A) =-A+A=0O, 
(v) A(A+ B) =AA+AB, (vi) A+ pm)A=AA+ YA, 
(vii) A(UA) = Ap)A, (viii) 1A =A. 
All these properties follow from similar results concerning the real numbers. a 
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Matrix-Vector Products 


The product of matrices can also be defined in certain instances. We will first consider the 
product of ann x m matrix and am x 1 column vector. 


Definition 6.6 Let A be ann x m matrix and b an m-dimensional column vector. The matrix-vector 
product of A and b, denoted Ab, is an n-dimensional column vector given by 


m 
ay a2 +++ Aim | | ye adi 
m 
a2, 22, +++ Aam | | by yo adj 
Ab = = : a 
m 
GQni Gn2 ++ Gnm bin yw Ayidj 


For this product to be defined the number of columns of the matrix A must match the 
number of rows of the vector b, and the result is another column vector with the number of 
rows matching the number of rows in the matrix. 


3 2 
Example 2 Determine the product AbifA =] —1 1 and b= - F 
6 4 


Solution Because A has dimension 3 x 2 and b has dimension 2 x 1, the product is defined 
and is a vector with three rows. These are 


33) +2(7-D=7, (-)D@G)+1-1l) =-4 and 6(3)+4(-1) = 14. 


eel 


The introduction of the matrix-vector product permits us to view the linear system 


That is, 


AX, + Ay2X2 +++ + AinX_ = D1, 


AX, + An2X2 +++++ ArnXn = bo, 


AniX, + An2X2 +++ + AnnXn = Dns 


as the matrix equation 


Ax =b, 
where 
ay a2 Ain x] by 
a2, 22, +++ Ady x2 by 
A= : , ; , x= . |, and b= ~ Ws 
Gni Gn2 *** Ann Xn by 


because all the entries in the product Ax must match the corresponding entries in the vector 
b. In essence, then, an n x m matrix is a function with domain the set of m-dimensional 
column vectors and range a subset of the n-dimensional column vectors. 
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Definition 6.7 


Example 3 
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Matrix-Matrix Products 


We can use this matrix-vector multiplication to define general matrix-matrix multiplication. 


Let A be ann x m matrix and B an m x p matrix. The matrix product of A and B, denoted 
AB, is ann x p matrix C whose entries c;; are 
m 
cif = be ix byj = Aj dij + Aizb2j + +++ + AimDnj, 
k=1 


for each i = 1,2,---n, andj = 1,2,--- ,p. | 


The computation of c;; can be viewed as the multiplication of the entries of the ith row 
of A with corresponding entries in the jth column of B, followed by a summation; that is, 


bij 
bo; 
dim| : 


[ai1,ai2,-°° = Ci; 


Dinj 


where 


Cij = AD]; + Aiba + +++ + dimbnj = ye Aig Dy. 
k=1 
This explains why the number of columns of A must equal the number of rows of B for the 


product AB to be defined. 
The following example should serve to clarify the matrix multiplication process. 


Determine all possible products of the matrices 
3 2 
A=|-1 1 pba ee alk 
1 4 
2 10 1 1-1 
C=| -1 3 2 1 +4, and Bale aa 
1 1 2 0 
Solution The size of the matrices are 
A:3x2, B:2x3, C:3x4, and D:2x2. 
The products that can be defined, and their dimensions, are: 
AB:3x3, BA:2x2, AD:3x2, BC:2x4, DB:2x3, and DD:2x2. 


These products are 


12 5 1 4 1 7 —5 
AB= 1 03], wa=| 4 TAL AD= 1 0], 
14 5 7 9 —5 
2 4 0 3 -1 0 -3 1 0 
noe[?2 $23], pee[2 9-3], at weft 9 
| 
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Notice that although the matrix products AB and BA are both defined, their results are 
very different; they do not even have the same dimension. In mathematical language, we say 
that the matrix product operation is not commutative, that is, products in reverse order can 
differ. This is the case even when both products are defined and are of the same dimension. 
Almost any example will show this, for example, 


Proll t]=Lo ar} mem [or dlr ol-Le a 


Certain important operations involving matrix product do hold, however, as indicated 
in the following result. 


Theorem 6.8 LetA be ann x m matrix, B be anm x k matrix, C be ak x p matrix, D be an m x k matrix, 
and i be a real number. The following properties hold: 


(a) A(BC) = (AB)C;  (b) A(B+D) =AB+AD; (c)X(AB) = (AA)B = A(AB). 


Proof ‘The verification of the property in part (a) is presented to show the method involved. 
The other parts can be shown in a similar manner. 

To show that A(BC) = (AB)C, compute the sj-entry of each side of the equation. BC 
is anm x p matrix with sj-entry 


k 
(BC)y = Do ducy. 
l=1 


Thus, A(BC) is ann x p matrix with entries 


m m k 
Ia(Be)y = YO = Sa 6 bay ae 
s=1 /=1 


Similarly, AB is ann x k matrix with entries 


(AB) = >> aisbsi, 


s=1 


so (AB)C is ann x p matrix with entries 


k m 
[(AB)C]i; = Pay = > a dis 6) Ci = 3 3 dis DsiCy. 


l=1 \s=1 l=1 s=1 


Interchanging the order of summation on the right side gives 


m k 
[(AB)C]ij = > Y > aisbacy = [A(BC)]ij, 
s=1 l=1 
for eachi = 1,2,...,n andj = 1,2,...,p.So A(BC) = (AB)C. = 8 


Square Matrices 


Matrices that have the same number of rows as columns are important in applications. 
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Definition 6.9 


The term diagonal applied to a 
matrix refers to the entries in the 
diagonal that runs from the top 
left entry to the bottom right 
entry. 


Definition 6.10 


A triangular matrix is one that 
has all zero entries except either 
on and above (upper) or on and 
below (lower) the main diagonal. 


Illustration 


Definition 6.11 


The word singular means 
something that deviates from the 
ordinary. Hence a singular matrix 
does not have an inverse. 


Direct Methods for Solving Linear Systems 


(i) A square matrix has the same number of rows as columns. 
(ii) A diagonal matrix D = [d;;] is a square matrix with d;; = 0 whenever i # j. 


(iii) 


The identity matrix of order n, J, = [6;;], is a diagonal matrix whose diagonal 
entries are all 1s. When the size of /, is clear, this matrix is generally written simply 
as I. a 


For example, the identity matrix of order three is 
1 0 0 
T=] 0 1 0 
0 0 1 
An upper-triangular n x n matrix U = [u;;] has, for each j = 1,2,--- ,n, the entries 
ujj =O, foreachi=j+1,j+2,---,n; 
and a lower-triangular matrix L = [/;;] has, for each j = 1,2,--- ,n, the entries 


for eachi = 1,2,--- ,j—1. i 


A diagonal matrix, then, is both both upper triangular and lower triangular because its 
only nonzero entries must lie on the main diagonal. 


Consider the identity matrix of order three, 
1 
h=| 0 
0 


If A is any 3 x 3 matrix, then 


41 412. a3 


1 O a1 a2 a43 
AL = a2, 422. a3 0 1 
0 0 


0 
O |=] ay ayn ay 
1 43, 432-433 


=A. 


431 32 433 


The identity matrix J, commutes with any n x n matrix A; that is, the order of multi- 
plication does not matter, 


[A=A=Al,. 


Keep in mind that this property is not true in general, even for square matrices. 


Inverse Matrices 


Related to the linear systems is the inverse of a matrix. 
Ann X n matrix A is said to be nonsingular (or invertible) if ann x n matrix A”! exists 


with AA~! = A~!A = I. The matrix A~! is called the inverse of A. A matrix without an 
inverse is called singular (or noninvertible). a 


The following properties regarding matrix inverses follow from Definition 6.11. The 
proofs of these results are considered in Exercise 5. 
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For any nonsingular 1 x n matrix A: 
(i) Aq! is unique. 


(ii) A7! is nonsingular and (A~!)~! = A. 


(iii) If B is also a nonsingular n x n matrix, then (AB)~! = B-'A7!. | 
Let 
2 5 1 
12 -l a) 9 79 
4 1 2 
-1 1 2 1 I 1 
3 3 3 


Show that B = A~!, and that the solution to the linear system described by 


xy + 2x. - x3 = 2, 
2x, + x2 = 3, 
—x,) + x + 2x3 = 4. 
is given by the entries in Bb, where b is the column vector with entries 2, 3, and 4. 


Solution First note that 


1 2 -1 
AB= 2 1 
-1 1 2 


| 


WI Ol oly 
WIE CIN Ole 
ll 
oor 
oro 
- OO 
| 
om 


In a similar manner, BA = J;, so A and B are both nonsingular with B = A-'andA = B™!. 
Now convert the given linear system to the matrix equation 


12 -1 x1 2 
2. 1 0 X2 = 3 > 
“i 1. “2 ll 3 4 


and multiply both sides by B, the inverse of A. Because we have both 


B(Ax) = (BA)x = 13x =x and B(Ax) =b, 


we have 
2 5 1 
a) nr) 1 2 -1 
BAx = ¢ -p 3 1 x=x 
a3 3 3 -1 1 2 
9 9 9 
and 
_2 5 _l 7 
9 9 9 2 9 
a -_ 4 I 2 _ 13 
BAx = B(b) = 5 5 5 5S (= = 
1 1 1 4 5 
3 3 3 3 


This implies that x = Bb and gives the solution xj = 7/9, x2 = 13/9, and x3 = 5/3. a 


Although it is easy to solve a linear system of the form Ax = b if A~! is known, 
it is not computationally efficient to determine A~! in order to solve the system. (See 
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Exercise 8.) Even so, it is useful from a conceptual standpoint to describe a method for 
determining the inverse of a matrix. 

To find a method of computing A~! assuming A is nonsingular, let us look again at 
matrix multiplication. Let B; be the jth column of the n x n matrix B, 


n 
C1 ayy 42 +++ Ain bij a Ub 
n 
wa @21 0 422" An bj ra Q2KdK 
nj Gnl  4n2 "Gan Dnj et Andy 


Suppose that A~! exists and that A~! = B = (b; j). Then AB = J and 


0 
0 
AB; =| 1 |, where the value 1 appears in the jth row. 
0 
0 


To find B we need to solve n linear systems in which the jth column of the inverse is the 
solution of the linear system with right-hand side the jth column of J. The next illustration 
demonstrates this method. 


Illustration To determine the inverse of the matrix 
12 -1 
A= 2 1 
-1 1 2. 


let us first consider the product AB, where B is an arbitrary 3 x 3 matrix. 


1 2 -1 by Dy by 
AB= 21 #O br, bon 3 
-1 1 2 b3, 32-33 


by, + 2b21 — b31 by2 + 2bx — b32_——b3. + 2b23 — b33 
a 2b1; + ba 2b12 + bx 2b13 + by3 
—by, + by, +2b3;  —by2 +b +2632 —b13 + b3 + 2b33 


If B= A~!, then AB = I, so 


by, +2b2, — b3, = 1, biz + 2bx2 — b32 = 0, bj3 + 2b23 —  b33 = 0, 
2b4, + ba = 0, 2b12 + br =1, and 2b)3+ bo = 0, 
—by, + by +263; = 0, —bio+ by + 2b = 0, —bi3 + bo3 + 2b33 = 1. 
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Notice that the coefficients in each of the systems of equations are the same, the only 
change in the systems occurs on the right side of the equations. As a consequence, Gaussian 
elimination can be performed on a larger augmented matrix formed by combining the 
matrices for each of the systems: 


12 -1 1 0 0 
2 1 0 0 1 0 
-1 1 2 0 0 1 
First, performing (£7 —2E,) > (£2) and (E3+£,) —> (E3), followed by (E3+ £2) > (E3) 
produces 
1 2-1 10:0 1 2-1 1 0:0 
0-3 2 2 1:0 and 0 -3 2 21:0 
0 3 1 1 0:1 0 0 3 -1 1:1 


1 2-1 : 1 1 2 -1 : 0 1 2 -1l 0 
0 -3 2:2 |,/ 0 -3 2 :14,) 0 -3 0], 
0 0 3: -1 0 0 3:1 0 0 3 1 
to eventually give 
bi, = -j, by. = 2, bi3 = —3; 
bo = 3, bo = —5, and bys = §, 
b3, = -i, b32 = ‘, b32 = i. 


As shown in Example 4, these are the entries of A7!: 


Whe OI Ole 


B=A'= 


WIR Ol Oly 
WI Ole olmn 


As we saw in the illustration, in order to compute A~! it is convenient to set up a larger 
augmented matrix, 


[A> £], 


Upon performing the elimination in accordance with Algorithm 6.1, we obtain an augmented 
matrix of the form 


[us ¥ 


where U is an upper-triangular matrix and Y is the matrix obtained by performing the same 
operations on the identity J that were performed to take A into U. 
Gaussian elimination with backward substitution requires 

4 3 1 =. . ia fore 4 3 3 2 n ok : 

3” - 3” multiplications/divisions and 3” - 5” + 6 additions/subtractions. 
to solve the n linear systems (see Exercise 8(a)). Special care can be taken in the implemen- 
tation to note the operations that need not be performed, as, for example, a multiplication 
when one of the multipliers is known to be unity or a subtraction when the subtrahend is 
known to be 0. The number of multiplications/divisions required can then be reduced to 
nm and the number of additions/subtractions reduced to n? — 2n? + n (see Exercise 8(d)). 
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Definition 6.13 


Illustration 


Theorem 6.14 


Direct Methods for Solving Linear Systems 


Transpose of a Matrix 


Another important matrix associated with a given matrix A is its transpose, denoted A’. 


The transpose of an n x m matrix A = [a;;] is the m x n matrix A’ = [a,], where for each i, 
the ith column of A’ is the same as the ith row of A. A square matrix A is called symmetric 
ifA =A’. a 


The matrices 


7 2 #20 6 4 -3 
A=|3 5 -1 |, a=| 3 7 ail C= 4 2 0) 
0 5 -6 —3 0 1 
have transposes 
7 3 0 2 3 6 4 -3 
A‘'=]| 2 5], B=] 4 -5 1], C= 4 —2 0 
0 -l1 -6 7 -1 —3 0 1 


The matrix C is symmetric because C' = C. The matrices A and B are not symmetric. 


The proof of the next result follows directly from the definition of the transpose. 


The following operations involving the transpose of a matrix hold whenever the operation 
is possible: 


@) (A) =A, (iii) (AB)! = BiA’, 
(ii) (A+B) =A'+B, (iv) if A~! exists, then (A~!)' = (A‘)~!. 


Matrix arithmetic is performed in Maple using the LinearAlgebra package whenever 
the operations are defined. For example, the addition of two n x m matrices A and B is 
done in Maple with the command A + B, and scalar multiplication by a number c is defined 
by cA. 

If A is n x m and B is m x p, then the n x p matrix AB is produced with the com- 
mand A.B . Matrix transposition is achieved with Transpose(A) and matrix inversion, with 
MatrixInverse(A). 


EXERCISE SET 63 


1. 


Perform the following matrix-vector multiplications: 


[ole » [al 


2 0 0} | 2 1 -—2 4 
c. 3. -l 2) 15 d. [—4 0 0O]} -—2 3 ll 
0 2 -3}{[1 4 1 0 
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2. Perform the following matrix-vector multiplications: 


memes cree 


2 1 0 2 3 =2 
c. 1 -1 2 5 d. [2 -—2 1]] -—2 3 1 
0 2 4 1 0 t= 


3. Perform the following matrix-matrix multiplications: 


ei 2 -3 1 5 b 2° =3 1 5 -4 
° 3 -l 2 0 : 3. -1 =3. 2 0 
2 -3 1 Oo 1 -2 2 1 2 1 -2 
Cc. 4 3 0 1 0 -!l d. —2 3 0 —4 1 
5 2 —4 2 3 -=2 2 -1 3 0 2 
4. Perform the following matrix-matrix multiplications: 
‘. —2 3 2 =5 > : 2 —2 3 
° 0 3 =) 2 —3 2 2 
[ 2-3 -2 2-3 4 3 E —-1 2 
c. —3 4 1 =3 4-1 2 5 3 4 -l 
[| -2 1 —-4 4 -1 -2 —2 1 4 3 =5 
5. Determine which of the following matrices are nonsingular, and compute the inverse of these matrices: 
[ 4 2 6 1 2 0 
a 3 0 7 b. 2 1 -l 
| -—2 -l -3 3 1 1 
Fo4id1 -t 1 4 0 0 0 
P 12 -4 -2 d 6 0 0 
° 2-11 1 5 ° 9 11 1 0 
{| -l 0 -—2 —4 5 1 1 


6. Determine which of the following matrices are nonsingular, and compute the inverse of these matrices: 


1 2 -1l 4 0 0 
a. 0 1 2 0 
[| -1 4 3 m fone 
fr 12 3 4 2 0 1 2 
21-1 | wi 02 
. —3 2 0 1 2 3 1 
L 0 5 2 6 [3 - i od | 


7. Given the two 4 x 4 linear systems having the same coefficient matrix: 


X} — X2 + 2x3 — x4 = 6, Xx, —X%2 + 2x3-x4 = 1, 

xy —x34+%4= 4, x, —%34+44= 1, 

2x, +X. + 3x3 — 4x4 = —2, 2x, +X. + 3x3 — 4x4 = 2, 
—%) +43 —-—2x4=5; —%y +4%3-x4 = 1. 


a. Solve the linear systems by applying Gaussian elimination to the augmented matrix 


f= Bene 6 
t 0 =< hf? 4 4 
2. i Geek Siw. 9 
Cat WW. od 2. 4: el 
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b. Solve the linear systems by finding and multiplying by the inverse of 


1 -l 2 —1 
1 0 -!l 1 
2 1 3. -4 
0 -l 1 -l 


ce. Which method requires more operations? 


8. Consider the four 3 x 3 linear systems having the same coefficient matrix: 


2x, — 3x. + x3 = 2, 2x, — 3x. +x3 = 6, 
xX) $x —2%3=-1, XxX, +x) — 243 = 4, 
—xX, +x) — 3x3 = 0; —x, +x) — 3x3 =5; 
2x, — 3x2 +x3 = 0, 2x, — 3x. +23 = —l, 
xX, +x —-%3 = 1, xX, +x — x3 = 0, 
xX, +X. — 3x3 = —3; —x, + x) — 3x3 = 0. 


a. Solve the linear systems by applying Gaussian elimination to the augmented matrix 


233 ff 2 26. Of 
i dt 2bseat4 7 °® 
=| ft. Fo OS = oO 


b. Solve the linear systems by finding and multiplying by the inverse of 


2 -3 1 
A= 1 1 -l 
=1 1 -3 


c. Which method requires more operations? 
9. The following statements are needed to prove Theorem 6.12. 

a. Show that if A~! exists, it is unique. 
b. Show that if A is nonsingular, then (A~!)~! = A. 
c. Show that if A and B are nonsingular nxn matrices, then (AB)~! = B-'A7!. 

10. Prove the following statements or provide counterexamples to show they are not true. 
a. The product of two symmetric matrices is symmetric. 
b. The inverse of a nonsingular symmetric matrix is a nonsingular symmetric matrix. 
ce. IfA and B aren x n matrices, then (AB)' = A‘B'. 

11. a. Show that the product of two n x n lower triangular matrices is lower triangular. 
b. Show that the product of two n x n upper triangular matrices is upper triangular. 
c. Show that the inverse of a nonsingular n x n lower triangular matrix is lower triangular. 


12. Suppose m linear systems 
Ax” =p”), p=1,2,...,m, 
are to be solved, each with the n x n coefficient matrix A. 
a. Show that Gaussian elimination with backward substitution applied to the aug- mented matrix 
[A: bOp® ak b”] 
requires 


1 1 
3” + mn? — 3” multiplications/ divisions 
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and 


1 1 
ra + mn? — ial —mn+ ma additions/subtractions. 


b. Show that the Gauss-Jordan method (see Exercise 12, Section 6.1) applied to the augmented 


matrix 
[A: pop aoe b”] 
requires 
1 a) 9 1 «4s : ote 
lal +mn°— =n multiplications/divisions 
and 


1 1 
x” + (m— Dr? + (5 - m) n_additions/subtractions. 


c. For the special case 


foreach p = 1,...,m, withm = n, the solution x”) is the pth column of A~!. Show that Gaussian 
elimination with backward substitution requires 
4 , 


1 
ae - 3" multiplications/divisions 


and 


4, 3, 1 ae . 
—n — =n + ra additions/subtractions 


3 2 


for this application, and that the Gauss-Jordan method requires 


3 1 
xm aah multiplications/divisions 


and 


om —2n? + — additions/subtractions. 

d. Construct an algorithm using Gaussian elimination to find A~!, but do not per- form multiplica- 
tions when one of the multipliers is known to be 1, and do not per- form additions/subtractions 
when one of the elements involved is known to be 0. Show that the required computations are 
reduced to n> multiplications/divisions and n? — 2n? + n additions/subtractions. 

e. Show that solving the linear system Ax = b, when A~! is known, still requires n? multiplica- 
tions/divisions and n? — n additions/subtractions. 

f. Show that solving m linear systems Ax” = b”), for p = 1,2,...,m, by the method x” = 
A7'b(p) requires mn? multiplications and m(n? — n) additions, if A~! is known. 

g. Let Abe ann x n matrix. Compare the number of operations required to solve n linear systems 
involving A by Gaussian elimination with backward substitution and by first inverting A and 
then multiplying Ax = b by A“!, for n = 3, 10, 50, 100. Is it ever advantageous to compute A~! 
for the purpose of solving linear systems? 
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13. 


14. 


15. 


16. 


Direct Methods for Solving Linear Systems 


Use the algorithm developed in Exercise 8(d) to find the inverses of the nonsingular matrices in 
Exercise 1. 
It is often useful to partition matrices into a collection of submatrices. For example, the matrices 


1 2 -1 2 -1 7 0 
A=|} 3 -4 -3 and B= 3 0 4 5 
6 5 0 —2 1 -3 1 
can be partitioned into 
1 2:-1 ‘ 2 -1 7:0 ; 
3 -4:-3 | = it oe and 4 +6. 48 ] | | Pe, ee 
6 eae 2 ny A Ar “5 see 7 “3° oats 4 | Bo Bu 


a. Show that the product of A and B in this case is 


apa | Abu taba | AuBi + AB 
Ar Bi + A22Bo + Ani Biz + Ax2Br2 


b. If B were instead partitioned into 


would the result in part (a) hold? 
c. Make a conjecture concerning the conditions necessary for the result in part (a) to hold in the 
general case. 
In a paper entitled “Population Waves,” Bernadelli [Ber] (see also [Se]) hypothesizes a type of sim- 
plified beetle that has a natural life span of 3 years. The female of this species has a survival rate of 
5 in the first year of life, has a survival rate of ; from the second to third years, and gives birth to an 
average of six new females before expiring at the end of the third year. A matrix can be used to show 
the contribution an individual female beetle makes, in a probabilistic sense, to the female population 
of the species by letting a;; in the matrix A = [a;;] denote the contribution that a single female beetle 
of age j will make to the next year’s female population of age 7; that is, 


0 0 6 
A=| % 0 0 
0 4 0 


a. The contribution that a female beetle makes to the population 2 years hence is determined from 
the entries of A”, of 3 years hence from A, and so on. Construct A” and A?, and try to make a 
general statement about the contribution of a female beetle to the population in n years’ time for 
any positive integral value of n. 

b. Use your conclusions from part (a) to describe what will occur in future years to a population 
of these beetles that initially consists of 6000 female beetles in each of the three age groups. 

c. Construct A~!, and describe its significance regarding the population of this species. 

The study of food chains is an important topic in the determination of the spread and accumulation 

of environmental pollutants in living matter. Suppose that a food chain has three links. The first link 

consists of vegetation of types v), v2, ..., Un, Which provide all the food requirements for herbivores of 
species fy, hy, ..., A» in the second link. The third link consists of carnivorous animals c), C2,..., Cx; 

which depend entirely on the herbivores in the second link for their food supply. The coordinate q;; 

of the matrix 


Qn Qn2 ae nm 
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represents the total number of plants of type v; eaten by the herbivores in the species h;, whereas 


bij in 
bi Dig ets Dk 
by Doo vee Dox 
B= 
Dnt bin2 anes Dink 


describes the number of herbivores in species h; that are devoured by the animals of type c;. 


a. Show that the number of plants of type v; that eventually end up in the animals of species c; is 
given by the entry in the ith row and jth column of the matrix AB. 


b. What physical significance is associated with the matrices A— 1, B—1, and (AB)—1 = B—1A-—1? 
17. In Section 3.6 we found that the parametric form (x(t), y(t)) of the cubic Hermite polynomials through 


(x0), y(O)) = (xo yo) and (x(1), yA) = G1, y1) with guide points (xp +a, yo + By) and (x1 — a), 91 — 
Bi), respectively, are given by 
x(t) = (2(% — x1) + (@o +01) + BC — 0) — a — 2a19)t? + at + Xo, 
and 
y(t) = (200 — yi) + (Bo + Bi)? + (3Q1 — Yo) — Bi — 2Bo)t? + Bot + Yo. 
The Bézier cubic polynomials have the form 
E(t) = (2(x — x1) + 3(ao + 041) + BO — Xo) — 3(@1 + 2a0))17, +300t + Xo 


and 


5(t) = (200 — y1) + 3(Bo + Bi)? + (3Q1 — yo) — 3(Bi + 2Bo))t? + 3fot + yo. 


G & 
| =e Se Se 


0 0 3 
0 0 0 1 


a. Show that the matrix 


ooo 
S| 


transforms the Hermite polynomial coefficients into the Bézier polynomial coefficients. 


b. Determine a matrix B that transforms the Bézier polynomial coefficients into the Hermite poly- 
nomial coefficients. 


18. Consider the 2 x 2 linear system (A + iB)(x + iy) = c+ id with complex entries in component form: 
(ay, + iby) + iy1) + (iz + iby) Qo + iya) = €1 + id), 


(ay + ib) (1 + ty) + (Gn. + tbr.) (X2 + ty2) = C2 + idd. 


a. Use the properties of complex numbers to convert this system to the equivalent 4 x 4 real linear 
system 


Ax — By =c, 
Bx +Ay=d. 
b. Solve the linear system 


dU — 2) +11) + B+ 20 Q2 + 2) = 542i, 
(2+ A(x + iy) + 44 3) +i) = 4-1. 
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| Sa 6.4 The Determinant of a Matrix 


The determinant of a matrix provides existence and uniqueness results for linear systems 
having the same number of equations and unknowns. We will denote the determinant of a 
square matrix A by det A, but it is also common to use the notation |A|. 


Definition 6.15 Suppose that A is a square matrix. 


(i) IfA =[a]isal1 x 1 matrix, then detA = a. 


(ii) If A is ann x n matrix, with n > 1 the minor M;; is the determinant of the 
(n— 1) x (n— 1) submatrix of A obtained by deleting the ith row and jth column 
of the matrix A. 


(iii) The cofactor A;; associated with Mj; is defined by Aj; = (—1)'Mi;. 


The notion of a determinant (iv) The determinant of the n x n matrix A, when n > 1, is given either by 
appeared independently in 1683 
both in Japan and Europe, 7 i ay : 

although neither Takakazu Seki det A = > ajAijy = eS )) ’aijMij, for any i= 1,2,--+ ,n, 
Kowa (1642-1708) nor Gottfried ial a 

Leibniz (1646-1716) appear to 
have used the term determinant. 


or by 


detA = So aiAij = )(-D)/ajjMjj,_ for any j = 1,2,--- ,n. a 


i=1 i=1 


It can be shown (see Exercise 9) that to calculate the determinant of a general n x n 
matrix by this definition requires O(n!) multiplications/divisions and additions/subtractions. 
Even for relatively small values of n, the number of calculations becomes unwieldy. 

Although it appears that there are 2n different definitions of det A, depending on which 
row or column is chosen, all definitions give the same numerical result. The flexibility in 
the definition is used in the following example. It is most convenient to compute det A 
across the row or down the column with the most zeros. 


Example 1 Find the determinant of the matrix 


23> 24 
6 6 


& 
w 

CORN WwW 

once 


using the row or column with the most zero entries. 


Solution To compute det A, it is easiest to use the fourth column: 
det A = a4A14 + ar4Ar4 + €34A34 + G4gAgg = 5A34 = —5M3q. 


Eliminating the third row and the fourth column gives 


9 1 3 
detA=—5Sdet] 4 -2 7 
6 -6 8 
3.4 4 7 4 -2 
=-s{2ae| =f g [- enact] 6 [tad] “6 |}=-2 
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The determinant of an n x n matrix of can be computed in Maple with the LinearAlgebra 
package using the command Determinant(A). 

The following properties are useful in relating linear systems and Gaussian elimination 
to determinants. These are proved in any standard linear algebra text. 


Theorem 6.16 Suppose A is ann x n matrix: 


(i) If any row or column of A has only zero entries, then det A = 0. 
(ii) If A has two rows or two columns the same, then det A = 0. 


(iii) =f A is obtained from A by the operation (£;) < (E;), with i ¢ j, then detA = 
—det A. 


(iv) If A is obtained from A by the operation (AE;) —> (E£;), then detA = Adet A. 


(v) If A is obtained from A by the operation (E; + AE;) > (E£;) with i ¢ j, then 
det A = det A. 


(vi) If Bis also ann x n matrix, then det AB = det A det B. 
(vii) det A’ = det A. 
(viii) When A~! exists, detA~! = (det A)~!. 
(ix) If A is an upper triangular, lower triangular, or diagonal matrix, then 


detA = li ij. | 


L 


As part (ix) of Theorem 6.16 indicates, the determinant of a triangular matrix is simply 
the product of its diagonal elements. By employing the row operations given in parts (iii), 
f(iv), and (v) we can reduce a given square matrix to triangular form to find its determinant. 


Example 2 Compute the determinant of the matrix 


2 tt Wt 
1 1 0 3 
A=) _1 29 3 -1 
a hl at oe 


using parts (iii), (iv), and (v) of Theorem 6.16, doing the computations in Maple with the 
LinearAlgebra package. 

Solution Matrix A is defined in Maple by 

A := Matrix({[2, 1, -1, 1], (1, 1,0, 3], [-1, 2,3, -], [3, -1, -1, 2]) 


The sequence of operations in Table 6.2 produces the matrix 


1 i —-! i 
01 1 5 
BES 9G 3 43 
0 0 O -13 
By part (ix), det A8 = —39, so detA = 39. a 
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Table 6.2 


Theorem 6.17 


Corollary 6.18 


Direct Methods for Solving Linear Systems 


Operation Maple Effect 

LE; > E Al := RowOperation(A, 1, +) det Al = }detA 

E) — E, > E> A2 := RowOperation(A1, [2, 1], —1) det A2 = det Al = 3 detA 
E; +E, > E; A3 := RowOperation(A2, [3, 1], 1) det A3 = det A2 = 5 det A 
E,—3E, > Ey A4 := RowOperation(A3, [4, 1], —3) det A4 = det A3 = 5 det A 
2Ey > E> AS := RowOperation(A4, 2, 2) det A5 = 2 det A4 = det A 
E; — 3E) > E; A6 := RowOperation(A5, [3, 2], —3) det A6 = det AS = detA 
E,+ 3E) > Ey AT := RowOperation(A6, [4, 2], 2) det A7 = det A6 = det A 

E; = E4 A8 := RowOperation(A7, [3, 4]) det A8 = — detA7 = — detA 


The key result relating nonsingularity, Gaussian elimination, linear systems, and de- 
terminants is that the following statements are equivalent. 


The following statements are equivalent for any n x n matrix A: 


(i) The equation Ax = 0 has the unique solution x = 0. 


(ii) The system Ax = b has a unique solution for any n-dimensional column 
vector b. 


(iii) The matrix A is nonsingular; that is, A~! exists. 
(iv) detA 40. 


(v) Gaussian elimination with row interchanges can be performed on the system 
Ax = b for any n-dimensional column vector b. a 


The following Corollary to Theorem 6.17 illustrates how the determinant can be used 
to show important properties about square matrices. 


Suppose that A and B are both n x n matrices with either AB = J or BA = J. Then B = A7! 
(and A = B™'). a 


Proof Suppose that AB = I. Then by part (vi) of Theorem 6.16, 
1 = det(/) = det(AB) = det(A)- det(B), so det(A) 40 and det(B) 4 0. 


The equivalence of parts (iii) and (iv) of Theorem 6.17 imply that both A~! and B™! exist. 
Hence 


A! =A™!.1=A™!. (AB) =(A7'A)-B=1-B=B. 


The roles of A and B are similar, so this also establishes that BA = 7. Hence B = A7!. 
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EXERCISE SET 64 


1. Use Definition 6.15 to compute the determinants of the following matrices: 


1 2 0 4 0 1 
a. 2 1 -il b. 2 1 0 
3 1 1 2 2 3 
1 1 -1l 1 2 Oo tl 2 
| 12 -4 -2 ] d 1 1 0 2 
ve 21 1 5 a a ee ae 
| -1 0 -2 -4 | 3 -1 4 3 | 
2. Use Definition 6.15 to compute the determinants of the following matrices: 
4 2 6 2. 2 1 
a. -1 0 4 b. 3.4 -1 
2 1 £7 3 0 5 
1 1 2 1 1 2 3 4 
2 -1 2 0 d 2 1 -1 1 
“ 3 411 aan ee a a 
—-1 5 2 3 0 5 2 6 


3. Repeat Exercise 1 using the method of Example 2. 


> 


Repeat Exercise 2 using the method of Example 2. 


5. Find all values of @ that make the following matrix singular. 


1 -l a 
A=]| 2 2 1 
0 a -3 


7. Find all values of @ so that the following linear system has no solutions. 
2x, — xX» + 3x3 =5, 
4x + 2x9 + 2x3 => 6, 
—2x; + ax. + 3x3 = 4. 


8. Find all values of w so that the following linear system has an infinite number of solutions. 


2x, -— xX +3x3=5, 
4x, + 2x2 + 2x3 = 6, 
—2x, + ax + 3x3 = 1. 


9. Use mathematical induction to show that when n > 1, the evaluation of the determinant of ann x n 
matrix using the definition requires 


n—-1 
n} > mn multiplications/divisions and n!— 1 additions/subtractions. 
k=1 


10. Let A be a3 x 3 matrix. Show that if A is the matrix obtained from A using any of the operations 
(E\) <> (Ey), (E1) > (E3), or (En) > (Es), 


then detA = — det A. 
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11. Prove that AB is nonsingular if and only if both A and B are nonsingular. 


12. The solution by Cramer’s rule to the linear system 


Ay1X1 + A12X2 + 413x3 = Dy, 
yX1 + Ay2X2 + 4y3X3 = by, 


431X + 432X2 + 433X3 = bz, 


has 
by a2 ay D, a, db ay D> 
x, = — det by ax a3 =—, x= —det a, by ay =—_, 
D D D D 
b3 32, 33 a3, -b3 a3 
and 
a, a2 db D; a; a2) a3 
B= D det a2} a2 by = D’ where D = det a2} a22 a73 
a1 432_—b3 a3, 432 433 


a. Find the solution to the linear system 


2x1 + 3x2 - B= 4, 
xi 2x + 34> 6, 
xi 12x, + 5x3 = 10, 


by Cramer’s rule. 


b. Show that the linear system 


2x, + 3x2 — +3 = 4, 
XxX, 2x + x3 = 6, 


=X, = 12x + 5x3 =9 


does not have a solution. Compute D,, D2, and D3. 


c. Show that the linear system 


2x, + 3x. - 13 =4, 
x4 2x + 34> 6, 
=Ki = 12x, + 5x3 = 10 


has an infinite number of solutions. Compute D,, D2, and D3. 
d. Prove that if a 3 x 3 linear system with D = 0 has solutions, then D; = D, = D; = 0. 
e. Determine the number of multiplications/divisions and additions/subtractions required for 
Cramer’s rule on a3 x 3 system. 
13. a. Generalize Cramer’s rule to ann x n linear system. 


b. Use the result in Exercise 9 to determine the number of multiplications/divisions and addi- 
tions/subtractions required for Cramer’s rule on an n x n system. 


| a 6.5 Matrix Factorization 


Gaussian elimination is the principal tool in the direct solution of linear systems of equations, 
so it should be no surprise that it appears in other guises. In this section we will see that 
the steps used to solve a system of the form Ax = b can be used to factor a matrix. The 
factorization is particularly useful when it has the form A = LU, where L is lower triangular 
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and U is upper triangular. Although not all matrices have this type of representation, many 
do that occur frequently in the application of numerical techniques. 

In Section 6.1 we found that Gaussian elimination applied to an arbitrary linear system 
Ax = b requires O(n /3) arithmetic operations to determine x. However, to solve a linear 
system that involves an upper-triangular system requires only backward substitution, which 
takes O(n”) operations. The number of operations required to solve a lower-triangular 
systems is similar. 

Suppose that A has been factored into the triangular form A = LU, where L is lower 
triangular and U is upper triangular. Then we can solve for x more easily by using a two-step 
process. 


e First we let y = Ux and solve the lower triangular system Ly = b for y. Since L is 
triangular, determining y from this equation requires only O(n”) operations. 


© Once y is known, the upper triangular system Ux = y requires only an additional O(n”) 
operations to determine the solution x. 


Solving a linear system Ax = b in factored form means that the number of operations 
needed to solve the system Ax = b is reduced from O(n} /3) to O(2n’). 


Example 1 Compare the approximate number of operations required to determine the solution to a 
linear system using a technique requiring O(n*/3) operations and one requiring O(2n7) 
when n = 20, n = 100, and n = 1000. 


Solution Table 6.3 gives the results of these calculations. a 
Table6.3 1/3 2n? % Reduction 
10 33 « 10 2x 10 40 
100 3.3 x 10° 2x 104 94 
1000 3.3 x 108 2 x 10° 99.4 


As the example illustrates, the reduction factor increases dramatically with the size of 
the matrix. Not surprisingly, the reductions from the factorization come at a cost; determin- 
ing the specific matrices L and U requires O(n /3) operations. But once the factorization 
is determined, systems involving the matrix A can be solved in this simplified manner for 
any number of vectors b. 

To see which matrices have an LU factorization and to find how it is determined, first 
suppose that Gaussian elimination can be performed on the system Ax = b without row 
interchanges. With the notation in Section 6.1, this is equivalent to having nonzero pivot 
elements a, for each i = 1,2,...,n. 

The first step in the Gaussian elimination process consists of performing, for each 
j= 2,3,...,n, the operations 


(Ej = m,£)) = (Ej), where ma = ee (6.8) 
a 


These operations transform the system into one in which all the entries in the first column 
below the diagonal are zero. 
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Matrix factorization is another of 
the important techniques that 
Gauss seems to be the first to 
have discovered. It is included in 
his two-volume treatise on 
celestial mechanics Theoria 
motus corporum coelestium in 
sectionibus conicis Solem 
ambientium, which was 
published in 1809. 


Direct Methods for Solving Linear Systems 


The system of operations in (6.8) can be viewed in another way. It is simultaneously 
accomplished by multiplying the original matrix A on the left by the matrix 


1 Oxseereeee ee 0 

~My 1 

MO = 0 oe 
: ; ae! 

Wins  Orrreees “Ol 


This is called the first Gaussian transformation matrix. We denote the product of this 
matrix with A = A by A® and with b by b®, so 


Ax = M%Ax = Mb = dD”. 


In a similar manner we construct M®, the identity matrix with the entries below the 
diagonal in the second column replaced by the negatives of the multipliers 


The product of this matrix with A® has zeros below the diagonal in the first two columns, 
and we let 


APx = MPA2x = MOM Ax = MOMODH = D®. 


In general, with A®x = b™ already formed, multiply by the kth Gaussian transfor- 
mation matrix 


On 0 
0. 
M® = , : ie, = ll 
: . _ Msi . a 
a: oe 
4 
Dieu teeas 0 “mney Oiseneasee eke: O° 4 


to obtain 
A&tDy = MA = M®...MMAx = M©H® = DEY = M®...M™D. (6.9) 


The process ends with the formation of A”x = b”, where A” is the upper triangular 
matrix 


(1) d) (1) 
Gy Ag eye Din 
Q) vt, 
A” = 0 Ay) : 
= etna, « h? 
fe AIn-1n 
Disncasaveud 0 a”) 
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Theorem 6.19 
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given by 
AM = MV ye- MMA. 


This process forms the U = A portion of the matrix factorization A = LU. To 
determine the complementary lower triangular matrix L, first recall the multiplication of 
A® x = bp by the Gaussian transformation of M “ used to obtain (6.9): 


AwDy = MMA = MOH® = petY 
where M“ generates the row operations 
(Ej — mE) > (E), forj=k+1,...,n. 


To reverse the effects of this transformation and return to A“ requires that the operations 
(Ej + mj,E%) — (Ej) be performed for each j = k + 1,...,n. This is equivalent to 
multiplying by the inverse of the matrix M“, the matrix 


L® = [M®)]"! = 


GF artices a fap. ‘Weassics: ao 


The lower-triangular matrix Z in the factorization of A, then, is the product of the 
matrices L™; 


1 0.- dusileene bidisei 0 
L=LOL®...pe-) =| my de ne 
Mnl ee Mayol 


since the product of L with the upper-triangular matrix U = M°-)...M®MA gives 


LU = LYL® ... LO L@-PLe-) oye -Dye-D ye)... MPMVA 


= (M?]-! mM]! coe [uM 2) 'Tu@® Dy tl. M@ Dy 2).. MOMMA —A. 
Theorem 6.19 follows from these observations. 
If Gaussian elimination can be performed on the linear system Ax = b without row inter- 


changes, then the matrix A can be factored into the product of a lower-triangular matrix L 
and an upper-triangular matrix U, that is, A = LU, where mj = a‘? /a\? 


ji a 
(1) (1) () 
Ayo Agape Din 1 Qisseees Sen 0 
Qo. oe : 
0. as net m Le, 
U= pg ea, a s, and fee |) Pens, 
: . (n—1) y ey 0) 
: “An_in in ie I 
& 2 hey eek ihe cou 
(liteeseeees 0 a”) 
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Example 2 (a) Determine the LU factorization for matrix A in the linear system Ax = b, where 


1 1 0 3 1 
eo) ee 2, ect 2 
-1 2 3 -1 4 
(b) Then use the factorization to solve the system 
xy + x2 +3x4= 8, 


2x, + X2- x13+ X= 7, 
3x) — xX — 23 4+2x,= 14, 
—x, + 2x. +3x3- x4 = —-7. 


Solution (a) The original system was considered in Section 6.1, where we saw that the 
sequence of operations (Ey — 2E,) > (E>), (E3 — 3E,) > (3), (Ey — (-lE;) > (E4), 
(E3 — 4E,) > (E3), (E4 — (—3)E2) > (E4) converts the system to the triangular system 


Xx, + x2 + 3y4= £4, 
—xX-— %4- 5x,= —-7, 
3x3 + 13x, = 13, 

— 13x4 = -13. 


The multipliers m;; and the upper triangular matrix produce the factorization 


i: t @ 3 1 000 i' i @ 2 
2 1-1 1 2 100 6.41 22 =5 
ae ae oe ee ee ed ee ae, Co a ae 
1 2 -1 1 3 O41 6. i: & =33 
(b) To solve 
1 000 i £ @ 3 x 8 
7 | 2 eo © Aor <3 x |_| 7 
BREE a gh og 0 0 3 13 wi | 14a 
lt =) i 0 0 0 -13 x4 ~7 


we first introduce the substitution y = Ux. Then b = L(Ux) = Ly. That is, 


1 00077» 8 
a 2 teeth | 4 
=) a 4 ao || 14 

Ae aed sa 7 


This system is solved for y by a simple forward-substitution process: 


yi = 8; 
2yity2=7, 80 yy =7—2y, = —9; 
3y, +492 +93 = 14, so y3 = 14 — 3y, — 4y2 = 26; 
—y1 — 3y2 +y¥4 = —7, 80 yg = —74+ y) + 3y2 = —26. 
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We then solve Ux = y for x, the solution of the original system; that is, 


1 1 0 3 x] 8 
0 -1 -1l -5 x2} —9 
0 0 3 13 x3 | 26 
0 0 0 -13 x4 —26 
Using backward substitution we obtain x4 = 2, x3 = 0,x2 = —1,x, = 3. a 


The NumericalAnalysis subpackage of Maple can be used to perform the matrix fac- 
torization in Example 2. First load the package 


with(Student|NumericalAnalysis]) 


and the matrix A 


A := Matrix(([1, 1, 0,3], [2, 1, -1, 1], 3, -1, -1, 2], [—1, 2,3, -1]) 
The factorization is performed with the command 
Lower, Upper := MatrixDecomposition(A, method = LU, output = ['L’, ‘U']) 


giving 


1 0 0 0 1 1 0 2S 
2 1 0 0 Ora] 1. 2:3 
3 41 0)’; 0 O 3 13 
al =3; Od 0 @Q OO =13 


To use the factorization to solve the system Ax = b, define b by 
b := Vector([8, 7, 14, —7]) 


Then perform the forward substitution to determine y with Ux = y, followed by backward 
substitution to determine x with Ux = y. 


y := ForwardSubstitution(Lower,b): x := BackSubstitution(Upper, y) 


The solution agrees with that in Example 2. 

The factorization used in Example 2 is called Doolittle’s method and requires that 
1s be on the diagonal of L, which results in the factorization described in Theorem 6.19. 
In Section 6.6, we consider Crout’s method, a factorization which requires that 1s be on 
the diagonal elements of U, and Cholesky’s method, which requires that J;; = uj, for 
each i. 

A general procedure for factoring matrices into a product of triangular matri- 
ces is contained in Algorithm 6.4. Although new matrices L and U are constructed, 
the generated values can replace the corresponding entries of A that are no longer 
needed. 

Algorithm 6.4 permits either the diagonal of L or the diagonal of U to be 
specified. 
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LU Factorization 


To factor the n x n matrix A = [a;;] into the product of the lower-triangular matrix L = [l;;] 
and the upper-triangular matrix U = [u;;]; that is, A = LU, where the main diagonal of 
either L or U consists of all ones: 


INPUT dimension n; the entries a;;, 1 < i,j <n of A; the diagonal [)) = --- = Im = 1 
of L or the diagonal uw); = +--+ = Upn = 1 of U. 


OUTPUT the entries J;;, 1 < j < i, 1 < i < n of L and the entries, u;j,i < j <n, 


1<i<nofU. 


Step 1 Select J, and uw, satisfying 1,,4,, = ay). 
If /{;u;; = 0 then OUTPUT (‘Factorization impossible’ ); 
STOP. 
Step 2 Forj =2,...,nset uj =ayj/li1; (First row of U.) 
Jj, = aj) /uy,. (First column of L.) 

Step 3 Fori=2,...,n—1 do Steps 4 and 5. 

Step 4 Select /;; and u;; satisfying Liuji = ii — pam Lig Uki- 

If 1juj; = 0 then OUTPUT (‘Factorization impossible’ ); 
STOP. 
Step 5 Forj=it+l,...,n 


set ujj = x [ai - ie lity (ith row of U.) 


Li = 1 [ai - a Inti. (ith column of L.) 


Step 6 Select Inn and uny satisfying IpnUan = Ann — ae LnkUkn- 
(Note: If InnUnn = 0, then A = LU but A is singular.) 


Step 7 OUTPUT (/;; forj =1,...,iandi=1,...,n); 
OUTPUT (u;; for j =i,...,nandi=1,...,n); 
STOP. r | 


Once the matrix factorization is complete, the solution to a linear system of the form 
Ax = LUx = bis found by first letting y = Ux and solving Ly = b for y. Since L is lower 
triangular, we have 


and, for each i = 2,3,...,n, 


After y is found by this forward-substitution process, the upper-triangular system Ux = y 
is solved for x by backward substitution using the equations 


n 
Yn 1 

xX, = — and xi = — vi- Do mis 
Unn Uji j=i4+ 
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Permutation Matrices 


In the previous discussion we assumed that Ax = b can be solved using Gaussian elimination 
without row interchanges. From a practical standpoint, this factorization is useful only when 
row interchanges are not required to control the round-off error resulting from the use of 
finite-digit arithmetic. Fortunately, many systems we encounter when using approximation 
methods are of this type, but we will now consider the modifications that must be made 
when row interchanges are required. We begin the discussion with the introduction of a 
class of matrices that are used to rearrange, or permute, rows of a given matrix. 

Ann x n permutation matrix P = [p;;] is a matrix obtained by rearranging the rows 
of J, the identity matrix. This gives a matrix with precisely one nonzero entry in each row 
and in each column, and each nonzero entry is a 1. 


Illustration The matrix 


1 
P=]| 0 
0 


- OO 


0 
1 
0 


is a3 x 3 permutation matrix. For any 3 x 3 matrix A, multiplying on the left by P has the 
effect of interchanging the second and third rows of A: 


1 0 0 a1 a2 443 a1 a2 443 
PA=;} 0 0 1 a2, 422 4233 | =|] 431 432 433 
0 1 0 a3, 432-433 a2, 4x2 93 


Similarly, multiplying A on the right by P interchanges the second and third columns 
of A. 


Two useful properties of permutation matrices relate to Gaussian elimination, the first 
of which is illustrated in the previous example. Suppose k,,--- ,k, is a permutation of the 
integers 1,--- , and the permutation matrix P = (p;;) is defined by 


pat ink 
ih Seal . 
? 0, otherwise. 
Then 
The matrix multiplication AP e PA permutes the rows of A; that is, 
permutes the columns of A. 
Akj1 Gkj2 *** Akin 
Aky1 Aky2 + ** Akon 
PA= 
Qn 1 Aky2 ee Akan 


e P—' exists and P~! = P’. 


At the end of Section 6.4 we saw that for any nonsingular matrix A, the linear system 
Ax = bcan be solved by Gaussian elimination, with the possibility of row interchanges. 
If we knew the row interchanges that were required to solve the system by Gaussian elim- 
ination, we could arrange the original equations in an order that would ensure that no row 
interchanges are needed. Hence there is a rearrangement of the equations in the system that 
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permits Gaussian elimination to proceed without row interchanges. This implies that for 
any nonsingular matrix A, a permutation matrix P exists for which the system 


PAx = Pb 


can be solved without row interchanges. As a consequence, this matrix PA can be factored 
into 


PA = LU, 


where L is lower triangular and U is upper triangular. Because P~! = P’, this produces the 
factorization 


A=P™'!LU = (P'L)U. 


The matrix U is still upper triangular, but PL is not lower triangular unless P = J. 


Determine a factorization in the form A = (P‘L)U for the matrix 
0 0 -1 1 
1 1 -1 2 
i ee a 
1 2 0 2 


Solution The matrix A cannot have an LU factorization because a,; = 0. However, using 
the row interchange (£)) < (£2), followed by (£3 + E,) > (£3) and (E4 — E,) > (£4), 
produces 


1 1 -1 2 
0 0 -1 1 
0 0 1 2 
0 1 1 0 


Then the row interchange (£2) < (£4), followed by (Z4 + E3) — (E4), gives the matrix 


ocooor 
So = = 
oF Ree 
WN oN 


The permutation matrix associated with the row interchanges (E,) < (£2) and (Ex) = 
(E4) 1s 


010 0 
000 1 
P= oo. ta 
100 0 
and 
i & <1 2 
i 2 6.2 
Pe oe eh Bei 
Oo 1.4 
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Gaussian elimination is performed on PA using the same operations as on A, except 
without the row interchanges. That is, (FE, — E,) > (£2), (F3 + E1) > (E3), followed by 
(E4 + E3) — (E4). The nonzero multipliers for PA are consequently, 


m,=1, m,=-—l1, and m3 =-l, 


and the LU factorization of PA is 


10 0 0 1 1 -1 2 
1 1 0 0 0 1 1 0 
Ng eNO. “a. oe 
0 -1 1 00 O 3 
Multiplying by P~! = P" produces the factorization 
0 0 -1 1 1 1 -1 2 
10 00 0 1 1 0 
— p-l — pl = t = 
A= P (LU) = P\(LU) = (P'L)U = 1 0 1 0 0 0 12 a 
1 1 0 0 00 O 3 


A matrix factorization of the form A = PLU for a matrix A can be obtained using the 
LinearAlgebra package of Maple with the command 


LUDecomposition(A) 
The function call 
(P,L,U) := LUDecomposition(A) 


gives the factorization, and stores the permutation matrix as P, the lower triangular matrix 
as L, and the upper triangular matrix as U. 


EXERCISE SET 65 


1. Solve the following linear systems: 


1 0 0 2 3 -l xy 
a. 2 1 0 0 -2 1 mM |=] - 
-!1 0 1 0 0 3 X3 
2 0 0 1 1 1 xy —1 
b. -1 1 0 0 1 2 xy |= 
3 2 -1 0 0 1 x3 0 
2. Solve the following linear systems: 
1 0 0 2 1 -1 xy 1 
a. —2 1 0 0 4 2 my |= 0 
3 0 1 0 0 5 X3 —5 
1 0 0 1 2 -3 xy 4 
b. 2 1 0 0 1 2 xX. |=] 6 
=3 2 1 0 0 1 x3 8 


3. Consider the following matrices. Find the permutation matrix P so that PA can be factored into the 
product LU, where L is lower triangular with 1s on its diagonal and U is upper triangular for these 
matrices. 
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a | 
a A=|2 4 0 
0 1 -1 
E ie 
i a@: ws 
i cael ee ae 
Ee S| 


SRK OO KK CO 


1 1 
=2 =[ 
-1 1 


t 2 

A | 
-1 3 

> 0 | 


Consider the following matrices. Find the permutation matrix P so that PA can be factored into the 
product LU, where L is lower triangular with 1s on its diagonal and U is upper triangular for these 


matrices. 
0 2 -l 
a A= 1 -l 2 
1 -l 4 
1 1 -1l 2 
-1l -l 1 5 
de a ee 
2 3 4 5 


> 
ll 
NOK NOK 


2 —l 
4 7 
2 5 
1 -l 
2 4 
1 1 
3 4 


AarNNNW 


Factor the following matrices into the LU decomposition using the LU Factorization Algorithm with 


1, = 1 for all i. 


2 -1 1 
a. 3 3 
3 3 5 
2 0 0 0 
é 1 15 0 O | 
. 0 -3 05 0 
| 2-2 11 


b. 


d. 


1.012 
—2.132 
3.104 


2.1756 
—4.0231 
— 1.0000 

6.0235 


—2.132 3.104 
4.096 —7.013 
—7.013 0.014 
4.0231 —2.1732 5.1967 
6.0000 O 1.1973 
—5.2107 1.1111 0 
7.0000 0 —4.1561 


Factor the following matrices into the LU decomposition using the LU Factorization Algorithm with 


1, = 1 for all i. 


1 -1 0O 
a. 2 2 3 
—l 3. 2 
2 1 0 O 
c | -1 3 3 0 ] 
. 2 2 1 4 
| —2 225 | 


d. 


UID Ube wile 
WIN WIN NI 


0 
5.132 
—3.111 


1 
~4 
3 
8 
3 
8 
—3.460 O 5.217 
5.193 —2.197 4.206 
1.414 3.141 0 
= 12732 2.718 5212 | 


Modify the LU Factorization Algorithm so that it can be used to solve a linear system, and then solve 


the following linear systems. 
a 2x,- m+ x3=- 1, 


3x1+3x%2+9x3 = 0, 
3x +3x7+5x3 = 4. 


ec. 2x = 3, 
x, + 1.5x = 45, 
— 3x + 0.5x3 = —6.6, 
2x, - y+ xw,w+txuy=08. 


b. 


1.012x,; — 2.132x2 + 3.104x3 = 1.984, 


—2.132x, + 4.096x2 — 7.013x3 = —5.049, 


d. 2.1756x; + 4.023 1x. — 2.1732x3 + 5.1967x4 = 17.102, 


—4.0231x, + 6.0000x2 
—1.0000x; — 5.2107x2 + 1.11113 
6.0235x, + 7.0000x2 


+ 1.1973x4 = —6.1593, 
= 3.0004, 
— 4.1561x4 = 0.0000. 


3.104x, — 7.013x2 + 0.014x3 = —3.895. 
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8. Modify the LU Factorization Algorithm so that it can be used to solve a linear system, and then solve 
the following linear systems. 
a. X}— X) =2, bo oftx, + fm - ty = 1, 


3 2 4 
2x1 + 2x. a 3x3 = oh 2X, + 2x2 + 2x3 = 2. 
=x, + 3x2 + 2x3 = 4. 
2X = 2X2 + 2x3 = —3. 
b. 2x, + Xp = 0, dd. 2.121x,; — 3.460x + 5.217x4 = 1.909, 
—X, = 3x2 + 3x3 = 5, 5.193x = 2.197x3 + 4.206x4 = 0, 
2x, — 2X. + x3+4x4 = —2, 5.132x, + 1.414%) + 3.141x; = —2.101, 
—2x, + 2x. + 2x3 + 5x4 = 6. —3.111x, — 1.732x2 + 2.718x3 + 5.212x4 = 6.824. 
9. Obtain factorizations of the form A = P’LU for the following matrices. 
0 2 3 [1 2 —-1 
a A=] 1 1 -1 b A=] 1 2 3 
0 -1 1 [2 -l 4 
1 -2 3 0 [1 -2 3 0 
3-6 9 3 1 -—2 3 1 
A Se Pe fay Ol 
1 —-2 2 -2 2 1 3 -1l 


10. Suppose A = P'LU, where P is a permutation matrix, L is a lower-triangular matrix with ones on the 
diagonal, and U is an upper-triangular matrix. 


a. Count the number of operations needed to compute P’LU for a given matrix A. 
b. Show that if P contains k row interchanges, then 


det P = det P’ = (—1)". 


c. UsedetA = det P’ det L det U = (—1)* det U to count the number of operations for determining 


det A by factoring. 
d. Compute det A and count the number of operations when 

0 2 1 4 -1 3 

1 2 —l 3 4 0 

0 if 1 -l 2 —-l 

ve 2 3 =4 2 0 5 

1 1 1 3 0 2 

-1 -l 2 —l 2 0 


11. a. Show that the LU Factorization Algorithm requires 


in - in multiplications/divisions and in - sn + in additions/subtractions. 


b. Show that solving Ly = b, where L is a lower-triangular matrix with J;; = 1 for all i, requires 


in - in multiplications/divisions and in - in additions/subtractions. 


c. Show that solving Ax = b by first factoring A into A = LU and then solving Ly = b and Ux = y 
requires the same number of operations as the Gaussian Elimination Algorithm 6.1. 


d. Count the number of operations required to solve m linear systems Ax“ = b™ fork = 1,...,m 
by first factoring A and then using the method of part (c) m times. 


| a 6.6 Special Types of Matrices 


We now turn attention to two classes of matrices for which Gaussian elimination can be 
performed effectively without row interchanges. 
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Diagonally Dominant Matrices 
The first class is described in the following definition. 


Definition 6.20 Then x n matrix A is said to be diagonally dominant when 


n 


|ai| = > la;j| holds for eachi = 1,2,--- ,n. (6.10) 

j=l, 

J#i 
Each main diagonal entry in a A diagonally dominant matrix is said to be strictly diagonally dominant when the 
strictly diagonally dominant inequality in (6.10) is strict for each n, that is, when 
matrix has a magnitude that is 
strictly greater that the sum of the a ; 
magnitudes of all the other |aii| > De, lai; holds for each i = 1,2,--+ ,n. = 
entries in that row. j=l, 

JAI 


Illustration Consider the matrices 


72 0 6 4 -3 
A=|3 5 -1 | and B=| 4 -2 0 
0 5: a6 <3 6. 4 


The nonsymmetric matrix A is strictly diagonally dominant because 
I7| > [2|+ 0], [5] > [3] + |-1], and |—6| > |0| + [5). 


The symmetric matrix B is not strictly diagonally dominant because, for example, in the 
first row the absolute value of the diagonal element is |6| < |4| + |—3| = 7. It is interesting 
to note that A’ is not strictly diagonally dominant, because the middle row of A‘ is [2 5 5], 
nor, of course, is B’ because B’ = B. 


The following theorem was used in Section 3.5 to ensure that there are unique solutions 
to the linear systems needed to determine cubic spline interpolants. 


Theorem 6.21 A strictly diagonally dominant matrix A is nonsingular. Moreover, in this case, Gaussian 
elimination can be performed on any linear system of the form Ax = b to obtain its unique 
solution without row or column interchanges, and the computations will be stable with 
respect to the growth of round-off errors. a 


Proof We first use proof by contradiction to show that A is nonsingular. Consider the linear 
system described by Ax = 0, and suppose that a nonzero solution x = (x;) to this system 
exists. Let k be an index for which 


O < |x| = max |x|. 
I<jsn 


Because ee) aj jx; = 0 for eachi = 1,2,...,n, we have, when i = k, 


n 
AkkXk = — ) AKjX;- 
fl 


itk 
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From the triangle inequality we have 


n n 
lalla < So layllayl, 80 lal < > layl = ns < ) lay. 
j=l, j=l 
tk zk 
This inequality contradicts the strict diagonal dominance of A. Consequently, the only 
solution to Ax = 0 is x = 0. This is shown in Theorem 6.17 on page 398 to be equivalent 
to the nonsingularity of A. 

To prove that Gaussian elimination can be performed without row interchanges, we 
show that each of the matrices A?), A®, ..., A” generated by the Gaussian elimination 
process (and described in Section 6.5) is strictly diagonally dominant. This will ensure that 
at each stage of the Gaussian elimination process the pivot element is nonzero. 

Since A is strictly diagonally dominant, a,; 4 0 and A® can be formed. Thus for each 


(= 23 ote oN 
aa ie 
Ott Sy Sa, : 
if =a 7 for 2<j<n. 
ay 


First, a® = = 0. The triangle inequality implies that 


a (1) ay n n gd? a? 
(2) dd) ca d) oi a 
Det |= So fat - ay = lm ioe “ay 
j=2 
Wi J#i Wi igi 


But since A is strictly diagonally dominant, 


n 


ad 1) (1) ad 
De >< lay? |—lag?] and Sas? | < lai? — lai? 


j=2 
Ti i#i 
so 
(1) dd) 
1 < Ja? — ja? lain | a‘!| — Ja? a) _ lan’ lla? | 
Det | < Jaf? | — la} P+ Ty lan P| Laie D = lay l- arc 
ii 
The triangle inequality also implies that 
(1) (1), d) 
la = lai llaj tle (1) lai lai llaii | | =|a >), 
Dy = [ii 
lay; | lary | 


which gives 


y la’ < lay | 


vi 


This establishes the strict diagonal dominance for rows 2,...,n. But the first row of A” 
and A are the same, so A®) is strictly diagonally dominant. 

This process is continued inductively until the upper-triangular and strictly diagonally 
dominant A“ is obtained. This implies that all the diagonal elements are nonzero, so 
Gaussian elimination can be performed without row interchanges. 

The demonstration of stability for this procedure can be found in [We]. =. 8 
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Positive Definite Matrices 


The next special class of matrices is called positive definite. 


Definition 6.22 A matrix A is positive definite if it is symmetric and if x‘Ax > 0 for every n-dimensional 
vector x # 0. a 


The name positive definite refers Not all authors require symmetry of a positive definite matrix. For example, Golub 

to the fact that the number x‘Ax and Van Loan [GV], a standard reference in matrix methods, requires only that x'Ax > 0 

must be positive wheneverx #0. for each x 4 0. Matrices we call positive definite are called symmetric positive definite in 
[GV]. Keep this discrepancy in mind if you are using material from other sources. 

To be precise, Definition 6.22 should specify that the | x 1 matrix generated by the 

operation x‘Ax has a positive value for its only entry since the operation is performed as 


follows: 
41 412 +++ Ain x] 
P a2, 422, +++ Aan x2 
X'AX = [X1,X2,°°* Xn] 
Gni Gn2 *** Ann Xn 
n < 
Lei A jXj 
n 
jnt 5X ames 
= [X1,%2,°°* 5 Xn] ‘ = ) ) Qj jXiXj 
: i=1 j=l 
n 
Djai Anji; 
Example 1 Show that the matrix 
2 -l 0 
A=| -1l 2 -1 
0 -il 2 


is positive definite 
Solution Suppose x is any three-dimensional column vector. Then 
2 -l 0 x] 


x'Ax = [x1,x2,x3] | —1 2 -1 Xo 
0 -l 2 X3 


2X, —- Xx. 
= [%1,%2,x3] | —x) + 2x. -— x3 
—xX2 + 2x3 


= yaa — 2x1xX2 + 2, — 2x2x3 + ae : 
Rearranging the terms gives 
x'Ax = a; + ie — 2x1xX2 + 1a) + Ge — 2x2x3 + x3) + cA 
= x? + (x1 — x2)? + (2 — 23)? +23, 
which implies that 
x8 + (x1 — x2)? + (2 — 03)? +23 > 0 
unless x} = xX» = x3 = 0. a 
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It should be clear from the example that using the definition to determine if a matrix is 
positive definite can be difficult. Fortunately, there are more easily verified criteria, which 
are presented in Chapter 9, for identifying members of this important class. The next result 
provides some necessary conditions that can be used to eliminate certain matrices from 
consideration. 


Theorem 6.23 If Ais ann x n positive definite matrix, then 


(i) A has an inverse; (ii) a,j > 0, foreachi = 1,2,...,n; 


(iii) | max) <i j<n lay] < MAX <j<n |ayil; (iv) (a;;)" < ajjajj, for each i F j. | 


Proof 


(i) If x satisfies Ax = 0, then x’Ax = 0. Since A is positive definite, this implies 
x = 0. Consequently, Ax = 0 has only the zero solution. By Theorem 6.17 on 
page 398, this is equivalent to A being nonsingular. 


(ii) For a given i, let x = (xj) be defined by x; = 1 and x; = 0, ifj Ai. Since x 4 0, 
0 < x’Ax = ajj. 
(iii) Fork 4/, define x = (x;) by 


0, if iAjandiX¢k, 
x= { 1, ifisy, 
-1, if i=k. 


Since x 4 0, 
0 < WAX = ay + a — aj — Ay. 

But A’ = A, so aj = ay, which implies that 

2a < Ajj + Ake. (6.11) 
Now define z = (z;) by 

0, if iAjandiFk, 

1, if i=jori=k. 
Then z'Az > 0, so 

— 2a < ak + Gj. (6.12) 
Equations (6.11) and (6.12) imply that for each k 4 j, 


Akk + Ajj ae 


laxj| < ax |ajj|, SO max |ay| < max |ajil. 
1<i<n 


l<i<n 1<k,j<n 
(iv) Fori #/, define x = (x;) by 


0, if k Ajandk ¥i, 
X= ya, if k=i, 
1, if k=j, 


where @ represents an arbitrary real number. Because x # 0, 
0 < x'Ax = aja? + 2a; ja + aj. 
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As a quadratic polynomial in @ with no real roots, the discriminant of P(a) = 
aja? + 2a; ja + aj; must be negative. Thus 
4a?,—4ajaj <0 and a? < aja; 
ij hed] ij ui: i 
Although Theorem 6.23 provides some important conditions that must be true of posi- 
tive definite matrices, it does not ensure that a matrix satisfying these conditions is positive 
definite. 
The following notion will be used to provide a necessary and sufficient condition. 


Definition 6.24 A leading principal submatrix of a matrix A is a matrix of the form 


41 @i2 «+: Aik 
a2, 422, +++ Ad 
Ay = ; ; : ; 
Aki Aka + kk 
forsome 1 <k <n. a 


A proof of the following result can be found in [Stew2], p. 250. 


Theorem 6.25 A symmetric matrix A is positive definite if and only if each of its leading principal subma- 
trices has a positive determinant. a 


Example 2 In Example 1 we used the definition to show that the symmetric matrix 


2 -1 0 
A=]| -l 2 -l 
0 -1 2 


is positive definite. Confirm this using Theorem 6.25. 


Solution Note that 


det A; = det[2] = 2 > 0, 


2 -1 
deta =aet |_| 2 |=4- 1-350, 
and 
2 -1 0 
detA;=det] -—1 2 —-1 | =2det et — (—1) det a od 
-1 2 0 2 
0 -l 2 
=2(44-—1)+(-24+0)=4>0. 
in agreement with Theorem 6.25. a 


The next result extends part (i) of Theorem 6.23 and parallels the strictly diagonally 
dominant results presented in Theorem 6.21 on page 412. We will not give a proof of this 
theorem because it requires introducing terminology and results that are not needed for any 
other purpose. The development and proof can be found in [We], pp. 120 ff. 
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Theorem 6.26 The symmetric matrix A is positive definite if and only if Gaussian elimination without row 
interchanges can be performed on the linear system Ax = b with all pivot elements positive. 
Moreover, in this case, the computations are stable with respect to the growth of round-off 
errors. rT] 


Some interesting facts that are uncovered in constructing the proof of Theorem 6.26 
are presented in the following corollaries. 


Corollary 6.27 The matrix A is positive definite if and only if A can be factored in the form LDL’, where L 
is lower triangular with 1s on its diagonal and D is a diagonal matrix with positive diagonal 
entries. a 


Corollary 6.28 The matrix A is positive definite if and only if A can be factored in the form LL’, where L 
is lower triangular with nonzero diagonal entries. a 


The matrix L in this Corollary is not the same as the matrix L in Corollary 6.27. A 
relationship between them is presented in Exercise 32. 

Algorithm 6.5 is based on the LU Factorization Algorithm 6.4 and obtains the LDL’ 
factorization described in Corollary 6.27. 


LDL‘ Factorization 


To factor the positive definite n x n matrix A into the form LDL’, where L is a lower triangular 
matrix with 1s along the diagonal and D is a diagonal matrix with positive entries on the 
diagonal: 
INPUT the dimension n; entries a;;, for 1 < i,j <n of A. 
OUTPUT the entries /;;, for 1 <j <iand1<i<nofL,andd,, for 1 <i<nofD. 
Step 1 Fori=1,...,n do Steps 2-4. 

Step 2 Forj=1,...,i—1, set uj = lijdj. 

Step 3 Setd; = aj — 5 1; j0;. 

Step 4 Forj=i+1,...,nset lj = (qi — i) levn)/di.- 


Step 5 OUTPUT (J; forj =1,...,i— 1 andi=1,...,n); 
OUTPUT (d; fori =1,...,n); 
STOP. 7 


The NumericalAnalysis subpackage factors a positive definite matrix A as LDL’ with 
the command 


L, DD, Lt := MatrixDecomposition(A, method = LDLt) 


Corollary 6.27 has a counterpart when A is symmetric but not necessarily positive 
definite. This result is widely applied because symmetric matrices are common and easily 
recognized. 
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Corollary 6.29 


Example 3 


Direct Methods for Solving Linear Systems 


Let A be a symmetric n x n matrix for which Gaussian elimination can be applied without 
row interchanges. Then A can be factored into LDL’, where L is lower triangular with 1s 
on its diagonal and D is the diagonal matrix with ay ,...,a™ on its diagonal. o 


Determine the LDL' factorization of the positive definite matrix 


4 -1 1 
A=} -l 4.25. 2.75 
1 2.75 3.5 


Solution The LDL’ factorization has 1s on the diagonal of the lower triangular matrix L so 
we need to have 


a\1 a2) a3 1 0 0 di 0 0 1 lo} 131 
A= a21 422 4d32 = Io 1 0 0 dy 0 0 1 139 
a3; 432 433 131 139 1 0 0 d 0 0 1 
di dyly dls, 
=] dil, d+db, dglay + dyllsy 


dyls)  diloily, +doby — dil3, + dol3, + ds 


Thus 
aqy:4=d,—> d, =4, ayy: ~l=dihy => bh, = —0.25 
a3, : l= dls, => ly => 0.25, a22: 4.25 = dy + aly, =—> dy =4 


32 + 2.75 = dylals, + dola2 => bp = 0.75, a3 2 3.5 = dil}, + dnl, +g => as = 1, 


and we have 


1 0 0 4 0 0 1 —0.25 0.25 
A=LDL'=| —0.25 1 0 0 4 0 0 1 0.75 |. = 
0.25 0.75 1 0 0 1 0 O 1 


Algorithm 6.5 is easily modified to factor the symmetric matrices described in 
Corollary 6.29. It simply requires adding a check to ensure that the diagonal elements 
are nonzero. The Cholesky Algorithm 6.6 produces the LL' factorization described in 
Corollary 6.28. 


Cholesky 


To factor the positive definite n x n matrix A into LL’, where L is lower triangular: 


INPUT the dimension n; entries a;;, for 1 < i,j < nofA. 


OUTPUT the entries /;;, for 1 <j < iand 1 <i<nofL. (The entries of U = L' are 
ujj = 1, fori<j <nand1<i<n.) 
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Andre-Louis Cholesky 
(1875-1918) was a French 
military officer involved in 


geodesy and surveying in the 
early 1900s. He developed this 
factorization method to compute 
solutions to least squares 


problems: Example 4 
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Step 7 Set lj; = ./aj1. 

Step 2 Forj =2,...,n, set lj) = aj /li1. 

Step 3 Fori=2,...,n—1 do Steps 4 and 5. 

i-1 vi 

Step 4 Set]; = (ai _ wel i) . 
Step 5 Forj=it+l,...,n 

Mie Lili.) /lii- 

_ 1/2 

k=1 P,) : 


Step 7 OUTPUT (/;; forj =1,...,iandi=1,...,n); 
STOP. a 


set Li = (ai = 


Step 6 Set Inn = (am as 


The Cholesky factorization of A is computed in the LinearAlgebra library of Maple 
using the statement 


L := LUDecomposition(A, method = 'Cholesky') 


and gives the lower triangular matrix L as its output. 


Determine the Cholesky LL’ factorization of the positive definite matrix 


4 -l 1 
A=]| -l 4.25 2.75 
1 2.75 3.5 


Solution The LL’ factorization does not necessarily has 1s on the diagonal of the lower 
triangular matrix L so we need to have 


a1 a2, 43] li; O O hy ly by 
A=] @ ay a2 |=] bli ln O O ly Ise 
431 432-33 ly bn b33 0 Ob 
By lila Nilsi 
=] Inb §, +8, lyilg1 + loalse 
lub, bibitlale &,+8,+83 
Thus 
ay: 4= i => | = 2, ay. -la=lyhy = by = —0.5 
ay: loli, => bi =0.5, ay: 425=6,+6, => In =2 


a32: 2.75 = Inyhy tlyln = bp = 15, a3: 35= Ei + Es + le => bh3 = 1, 


and we have 


2 0 0 2 —-0.5 0.5 
-05 2 O 0 2 15]. = 
05 15 1 0 oO 1 


A=LL'= 
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The LDL’ factorization described in Algorithm 6.5 requires 
1 


if) 1 1 
ae oe Pi multiplications/divisions and gn = Pe additions/subtractions. 
The LL’ Cholesky factorization of a positive definite matrix requires only 
3,1. 2 er ee 1, 1 oe 
A + 5” = 3” multiplications/divisions and A = 7 additions/subtractions. 


This computational advantage of Cholesky’s factorization is misleading, because it requires 
extracting n square roots. However, the number of operations required for computing the n 
square roots is a linear factor of n and will decrease in significance as n increases. 

Algorithm 6.5 provides a stable method for factoring a positive definite matrix into the 
form A = LDL’, but it must be modified to solve the linear system Ax = b. To do this, 
we delete the STOP statement from Step 5 in the algorithm and add the following steps to 
solve the lower triangular system Ly = b: 


Step6 Sety, =),. 
Step 7 Fori=2,...,nset yj = bj — Dia lijyy. 
The linear system Dz = y can then be solved by 
Step 8 Fori=1,...,nset z; = y;/dj. 
Finally, the upper-triangular system L'x = z is solved with the steps given by 
Step 9 Set x, = 2. 
Step 10 Fori=n-—1,...,1setx;= 2 — 0,4) |ixy. 


j=i+l1 
Step 11. OUTPUT (x; fori =1,...,n); 
STOP. 


Table 6.4 shows the additional operations required to solve the linear system. 


Table 6.4 


Step Multiplications/Divisions Additions/Subtractions 
6 0 0 
7 n(n — 1)/2 n(n — 1)/2 
8 n 0 
9 0 0 
10 n(n — 1)/2 n(n — 1)/2 
Total nr r—n 


If the Cholesky factorization given in Algorithm 6.6 is preferred, the additional steps 
for solving the system Ax = b are as follows. First delete the STOP statement from Step 7. 
Then add 


Step 8 Set y; = b,/ly,. 

Step 9 Fori= 2 ... nsety = (2, = ae Lys) [ww 

Step 10 Set x, =Yp/lin- 

Step 11. Fori=n— 1, eee 1 set x= (vi _ pare Li) fw 


Step 12 OUTPUT (x; fori =1,...,7): 
STOP. 


Steps 8-12 require n? + n multiplications/divisions and n* — n additions/ subtractions. 
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Definition 6.30 


The name for a band matrix 
comes from the fact that all the 
nonzero entries lie in a band 
which is centered on the main 
diagonal. 
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Band Matrices 


The last class of matrices considered are band matrices. In many applications, the band 
matrices are also strictly diagonally dominant or positive definite. 


Ann x n matrix is called a band matrix if integers p and qg, with 1 < p,q <n, exist with 
the property that a;; = 0 whenever p < j —iorg < i—j. The band width of a band matrix 
is defined asw =p+q-—l. a 


The number p describes the number of diagonals above, and including, the main diag- 
onal on which nonzero entries may lie. The number g describes the number of diagonals 
below, and including, the main diagonal on which nonzero entries may lie. For example, 
the matrix 


7 2 O 
A=]| 3 5 -l 
0 -5 -6 


is a band matrix with p = g = 2 and bandwidth 24+ 2—1=3. 

The definition of band matrix forces those matrices to concentrate all their nonzero 
entries about the diagonal. Two special cases of band matrices that occur frequently have 
p=q=2andp=q=4. 


Tridiagonal Matrices 


Matrices of bandwidth 3 occurring when p = q = 2 are called tridiagonal because they 
have the form 


: . he Ann 
(0 eerie eee een 0) Ann—-1 —~ Ann 


Tridiagonal matrices are also considered in Chapter 11 in connection with the study of 
piecewise linear approximations to boundary-value problems. The case of p = g = 4 will 
be used for the solution of boundary-value problems when the approximating functions 
assume the form of cubic splines. 

The factorization algorithms can be simplified considerably in the case of band matrices 
because a large number of zeros appear in these matrices in regular patterns. It is particularly 
interesting to observe the form the Crout or Doolittle method assumes in this case. 

To illustrate the situation, suppose a tridiagonal matrix A can be factored into the 
triangular matrices L and U. Then A has at most (37 — 2) nonzero entries. Then there are 
only (3n — 2) conditions to be applied to determine the entries of L and U, provided, of 
course, that the zero entries of A are also obtained. 

Suppose that the matrices L and U also have tridiagonal form, that is, 


ae eee ee 0 1 uta, Overs 0 
ly. Ia. 0. i, % oe 
b= | 0, 2 | and ea] Ds 0 
a Mig 0 : a “hs Wea 
re Nes Semmens an Ostictnels | ee | 
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There are (2n — 1) undetermined entries of L and (n — 1) undetermined entries of U, which 
totals (3n — 2), the number of possible nonzero entries of A. The 0 entries of A are obtained 
automatically. 

The multiplication involved with A = LU gives, in addition to the 0 entries, 


ay = 11; 
4-1 =lj;-1,  foreachi = 2,3,...,n; (6.13) 
4 = 13-114 + li, for each i = 2,3,...,n; (6.14) 
and 
Giri =liuiigi, foreachi=1,2,...,n—1. (6.15) 


A solution to this system is found by first using Eq. (6.13) to obtain all the nonzero off- 
diagonal terms in L and then using Eqs. (6.14) and (6.15) to alternately obtain the remainder 
of the entries in U and L. Once an entry L or U is computed, the corresponding entry in 
A is not needed. So the entries in A can be overwritten by the entries in L and U with the 
result that no new storage is required. 

Algorithm 6.7 solves an n x n system of linear equations whose coefficient matrix is 
tridiagonal. This algorithm requires only (5n — 4) multiplications/divisions and (3n — 3) 
additions/subtractions. Consequently, it has considerable computational advantage over the 
methods that do not consider the tridiagonality of the matrix. 


Crout Factorization for Tridiagonal Linear Systems 


To solve the n x n linear system 


Ey s  ayx1 + a12X2 = Ant, 
Ey: A1X1 + Gz2X2 + a73X3 = A2n4+1, 
En-1 : An—1,n—2Xn-2 + an 1yn—1Xn-1 + an InXn = An-1yn+1; 
E, : Ann—1Xn-1 + AnnXn = Ann+1> 


which is assumed to have a unique solution: 


INPUT the dimension n; the entries of A. 
OUTPUT the solution x;,...,x,. 
(Steps 1-3 set up and solve Lz = Db.) 


Step 1 Set hy = ay; 
uy2 = ay2/l1; 
4 = A nsi/li. 
Step 2 Fori=2,...,n—1 set lj; = aji-1; (ith row of L.) 
lig = Git — Lig. Ui-1,85 
Uist = Gist /lis (7+ Ith column of U.) 
Zi = Gina — Fii-12i-1)/lu. 
Step 3 Set Lnn-1 = Ann-13 (nth row of L.) 
lan = ann — Lnn—1Un—1n- 
Ln = (Gnn+1 = bnn—1Zn—1) /lnn- 
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(Steps 4 and 5 solve Ux = 2.) 
Step 4 Set x, = Zz. 
Step 5 For i =n— 1,..., l set xj = 2% — Ujiqi X41. 


Step 6 OUTPUT (x1,...,xn); 
STOP. . 


Example 5 Determine the Crout factorization of the symmetric tridiagonal matrix 


2.1 0 0 
-1 2 -l 0 

0 -1 2 -1 |’ 

0 oO -l 2 


and use this factorization to solve the linear system 


2x) — X2 = 1, 
xX, + 2x2 -— 3 = 0, 
— xX2+2x3-— x4 =0, 

— 144+2x,= 1. 


Solution The LU factorization of A has the form 


ai, a2 0) 0 hy 0 0 0 1 uj2 0 0 
Aa| @1 2 a3 0 _| br lz O 0 O 1 ux3 O 
~ 0 a3. a33 ay | | O Io 3 O 0 O 1. ws34 
0) 0 a43 «~A44 0 0 143 l44 0 0 0 1 
ly yuy2 0 0 
_ | fa ba t+ haiti In2U73 0 
0 by 13 + bour3 13334 
0 0 l43 lag + 143.34 
Thus 
ay: 2=h) = 1 =2, an: —-l=liwy = up=—-}, 
ay: ~l=h) => hy =-I, ay: 2=ly+hwy = ly =-3, 
a3: —l=lyw; => u3=-3, an: —-la=k, => k= -1, 
433: 2=b3+loun3 = 13 = +, a4: —1=)b3u34 => u34 = -3, 
a3: -l=lg = lp =-l, d44: 2H=lytlyu3g => ly = x, 
This gives the Crout factorization 
2-1 0 0 2 0 00]fFf1-} 0 0 
_{-1 2-1 0}; _ J] -1 3 OO]}oO 1-% Of} © 
PN Ged Bet) | Ovck 20110, @ Time | 2 
0 0-1 2 0 o0-17]/L0 0 0 1 
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Theorem 6.31 


Direct Methods for Solving Linear Systems 


Solving the system 


i 0 o OTH 1 es 3 
-1 2 6.6 Za 0 ; 2 5 
= 2 = _| 3 
Li= 0-1 4 0 S| gives = 1 , 
0 0 -!1l 3 24 1 Z4 1 
and then solving 
1 1 
1 —- 3 0 0) x1 : X] 1 
0 1-2 0O x2 5 ; xX) 1 
a7 3 3 - 
Ux= 0 0 -_— 3 = | 1 gives a = 
0 0 0 1 X4 1 X4 1 


The Crout Factorization Algorithm can be applied whenever /;, # 0 for each i = 
1,2,...,n. Two conditions, either of which ensure that this is true, are that the coefficient 
matrix of the system is positive definite or that it is strictly diagonally dominant. An ad- 
ditional condition that ensures this algorithm can be applied is given in the next theorem, 
whose proof is considered in Exercise 28. 


Suppose that A = [a;;] is tridiagonal with a;;_1a;;4; 4 0, for each i = 2,3,...,n —1.If 
lani| > lai2|, lai] = laiz-1| + laiisi|, for each i = 2,3,...,n — 1, and |dan| > |dnn—-il, 
then A is nonsingular and the values of /;; described in the Crout Factorization Algorithm 
are nonzero for eachi = 1,2,...,n. | 


The LinearAlgebra package of Maple supports a number of commands that test prop- 
erties for matrices. The return in each case is true if the property holds for the matrix and 
is false if it does not hold. For example, 


IsDefinite(A, query = 'positive_definite’) 


would return true for the positive matrix 


2 -1 0 
A=]| -l 2. =] 
0 -1 2 


but would return false for the matrix 


sel 


Consistent with our definition, symmetry is required for a true result. 
The NumericalAnalysis subpackage also has query commands for matrices. Some of 
these are 


IsMatrixShape(A, ‘diagonal’) 
IsMatrixShape(A, ‘symmetric’) 
IsMatrixShape(A, ‘positivedefinite’) 
IsMatrixShape(A, ‘diagonallydominant') 
IsMatrixShape(A, ‘strictlydiagonallydominant’) 
IsMatrixShape(A, ‘triangular yyy) 
IsMatrixShape(A, ‘triangular’ ,e,) 
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EXERCISE SET 66 


1. Determine which of the following matrices are (i) symmetric, (ii) singular, (iii) strictly diagonally 
dominant, (iv) positive definite. 


21 2 1 0 
| a b | 0 3 0 
10 4 
4 2 6 4 00 0 
c 3 0 «7 a | 6 709 0 
a a " |o nu 10 
5 A&W 4 


2. Determine which of the following matrices are (i) symmetric, (ii) singular, (iii) strictly diagonally 
dominant, (iv) positive definite. 


—2 1 2 1 0 
i 1 3 bh | O 32 
1 2 4 
2 -1 O 2 3 1 2 
c -1 4 2 d —2 4 -l 5 
0 2 2 3 7 15 1 
6 —9 3 7 
3. Use the LDL’ Factorization Algorithm to find a factorizaton of the form A = LDL’ for the following 
matrices: 
2 -1 0 4 1 1 1 
a A=] -l 2 -1 1 3 -1 1 
0 -l 2 sie 1 -1 2 0 
1 1 0 2 
4 1 -1 0 6 2 1 -1 
1 3 -1 0 2 4 1 0 
S| ata SB coat ae ae oe ae 
0 0 2 4 | -l1 0 -l 3 | 
4. Use the LDL’ Factorization Algorithm to find a factorization of the form A = LDL’ for the following 
matrices: 
4 -1 1 4 2 2 
a A=] -1 3 0 b A=] 2 6 2 
1 0 2 2 2 5 
4 0 2 1 4 1 1 1 
0 3 -1 1 1 3 0 -l 
oa oo or Res a ie ae A 
1 1 3 8 1 -1 1 4 


5. Use the Cholesky Algorithm to find a factorization of the form A = LL’ for the matrices in 
Exercise 3. 

6. Use the Cholesky Algorithm to find a factorization of the form A = LL’ for the matrices in 
Exercise 4. 

7. Modify the LDL’ Factorization Algorithm as suggested in the text so that it can be used to solve linear 
systems. Use the modified algorithm to solve the following linear systems. 


a 2x, -— xX = 3, b. 4x, + x2 + x3 4+ x4 = 0.65, 
—xX,4+2x%.- x3 = 3, X,+3x.-— 23+ x4 = 0.05, 
— x» +2x3= 1. Xp — X. + 2x3 = 0, 
Xp + X + 2x4 = 0.5. 
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e« 8 4x4,+ m- % = 7; d. 6x, + 2x. + x3-— x4 =O0, 
X, + 3x. -— %3 = 8, 2x, + 4x. + x3 =, 
—X,— 2X +5x3 4 2x4 = —4, y+ Hm +43- x4 =-l, 
2x3 + 4x4 = 6. Xx, — x34 3x4 = —2. 
8. Use the modified algorithm from Exercise 7 to solve the following linear systems. 
a. 4x, — 2+ 23=-l, b. 4x, + 2x2+2x3 = 0, 
—x, + 3x2 = 4, 2x; + 6x2+2x3 = 1, 
x] +2x3 = 5. 2x, + 2x.+5x3 = 0. 
ce 4x +2x3+ x4 =-—2, Gd. 4xj4+ x2 + x3+ x4 = 2, 
3x2 — x3+ x4 =0, X\+3x2 — %4=2, 
2x; — Xo + 6x3 + 3x4 = 7, xX] +2x%3+ x4 = 1, 
Xp + xX. + 3x3 + 8x4 = —2. Xy— Xo + %34+4x, = 1. 


9. Modify the Cholesky Algorithm as suggested in the text so that it can be used to solve linear systems, 
and use the modified algorithm to solve the linear systems in Exercise 7. 


10. Use the modified algorithm developed in Exercise 9 to solve the linear systems in Exercise 8. 


11. Use Crout factorization for tridiagonal systems to solve the following linear systems. 


a. Xy— X = 0, b. 3x, + Xx =-l, 
—2x; + 4x2 — 2x3 = -1, 2x1 + 4x2. + x3 = 7, 
— xX +2x3 = 1.5. 2x2 + 5x3 = 9. 
ec = 2x,— XxX = 3; d. 0.5x; + 0.25x2 = 0.35, 
—x,+2x%- x3 = —-3, 0.35x, + 0.8x2 + 0.4x3 = 0.77, 
— »+2x%3=1. 0.25%. + 8 x3+0.5x4 = —0.5, 


X3 2X4 = —2.25. 
12. Use Crout factorization for tridiagonal systems to solve the following linear systems. 


a 2x, + xX =3, b. 2x; — X2 =5, 
xX, + 2x%.+ x3 = —2, —xX,+3xy+ 3 =4, 
2x»+3x3 = 0. X_ + 4x3 = 0. 
c. 2x, — Xx =3, d. 2x, -— Xx =1, 
xX, +2x.- x3 = 4, xX, +2x.- x3 =2, 
Xo — 2x3+ x4 = 0, 2x, + 4x3 -— x4 =-l, 
x34+2x4 = 6. 2x4—- X5 = —2, 
X4+2x5 = —-1. 
13. Let A be the 10 x 10 tridiagonal matrix given by a;; = 2, a;;4; = aj;-1 = —1, for eachi = 2,--- ,9, 
and ay, = 4019 = 2,4;2 = ajo9 = —1. Let b be the ten-dimensional column vector given by 


b; = bio = 1 and b; = 0, for each i = 2,3,--- ,9. Solve Ax = b using the Crout factorization for 
tridiagonal systems. 


14. Modify the LDL’ factorization to factor a symmetric matrix A. [Note: The factorization may not 
always be possible.] Apply the new algorithm to the following matrices: 


3-3 6 3 -6 9 
a A=] -3 2 -7 b A=! -6 14 -20 
6 —7 13 | 9 -20 29 
-l1 2 0 1 | f 2 -2 4 —4 
2 -3 2 -1 —2 3 -4 5 
S|, Gy, oe. ag ASN good 46) iG 
| 1 -1 6 W [| -4 5 -10 14 


15. Which of the symmetric matrices in Exercise 14 are positive definite? 
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16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 
25. 
26. 
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a ll — 
Find all w so that A = 1 2 1 | is positive definite. 
-1 1 4 
2a —-l 
Find all w so that A = a 2 1 | is positive definite. 
-1 1 


Find all aw and 6 > 0 so that the matrix 


4 ail 
A=| 26 5 4 
B 2 a 
is strictly diagonally dominant. 
Find alla > 0 and B > 0 so that the matrix 
3 2 £B 
A=|a 5 86 
2 la 


is strictly diagonally dominant. 


Suppose that A and B are strictly diagonally dominant n x n matrices. Which of the following must 
be strictly diagonally dominant? 
a. —A b. A’ c A+B d. A’ e A-B 


Suppose that A and B are positive definite n x n matrices. Which of the following must be positive 
definite? 


a —A b. A’ ce A+B d. A? e A-B 
Let 
1 0 -l 
A= 0 1 1 
-1 1 a 


Find all values of aw for which 


a. A is singular. b. A is strictly diagonally dominant. 
c. Ais symmetric. d. A is positive definite. 
Let 


Find all values of a and f for which 
a. A is singular. b. A is strictly diagonally dominant. 
c. Ais symmetric. d. A is positive definite. 


Suppose A and B commute, that is, AB = BA. Must A‘ and B’ also commute? 
Construct a matrix A that is nonsymmetric but for which x‘Ax > 0 for all x 4 0. 


Show that Gaussian elimination can be performed on A without row interchanges if and only if all 
leading principal submatrices of A are nonsingular. [Hint: Partition each matrix in the equation 


A® — M&-D YD... MOA 


vertically between the kth and (k + 1)st columns and horizontally between the kth and (k + 1)st rows 
(see Exercise 14 of Section 6.3). Show that the nonsingularity of the leading principal submatrix of 
A is equivalent to ae} # 0.] 
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27. Tridiagonal matrices are usually labeled by using the notation 


Te | ee 0 
by Gy, C2 
A= 0. b3 0 


to emphasize that it is not necessary to consider all the matrix entries. Rewrite the Crout Factorization 
Algorithm using this notation, and change the notation of the J;; and u;; in a similar manner. 


28. Prove Theorem 6.31. [Hint: Show that |uii+1| < 1, foreachi = 1,2,...,n—1, and that |/;;| > 0, for 
each i = 1,2,...,n. Deduce that detA = detL- det U 4 0.] 


29. Suppose V = 5.5 volts in the lead example of this chapter. By reordering the equations, a tridiagonal 
linear system can be formed. Use the Crout Factorization Algorithm to find the solution of the modified 
system. 


30. Construct the operation count for solving an n x n linear system using the Crout Factorization Algo- 
rithm. 


31. Ina paper by Dorn and Burdick [DoB], it is reported that the average wing length that resulted 
from mating three mutant varieties of fruit flies (Drosophila melanogaster) can be expressed in the 
symmetric matrix form 


1.59 1.69 2.13 
169 1.31 1.72 |, 
2.13 1.72 1.85 


A 


where a;; denotes the average wing length of an offspring resulting from the mating of a male of type 
i with a female of type j. 


a. What physical significance is associated with the symmetry of this matrix? 
b. Is this matrix positive definite? If so, prove it; if not, find a nonzero vector x for which x'Ax < 0. 


32. Suppose that the positive definite matrix A has the Cholesky factorization A = LL' and also the 


factorization A = LDL! , where D is the diagonal matrix with positive diagonal entries d,;,dx2,..., din. 
Let D'/? be the diagonal matrix with diagonal entries /d1,, dy, ...,./din- 
a. Show that D = D'/?p!/?, b. Show that L = Lp!/?, 


| a 6.7 Survey of Methods and Software 


In this chapter we have looked at direct methods for solving linear systems. A linear system 
consists of n equations in n unknowns expressed in matrix notation as Ax = b. These 
techniques use a finite sequence of arithmetic operations to determine the exact solution of 
the system subject only to round-off error. We found that the linear system Ax = b has a 
unique solution if and only if A~! exists, which is equivalent to detA 4 0. When A~! is 
known, the solution of the linear system is the vector x = A~!b. 

Pivoting techniques were introduced to minimize the effects of round-off error, which 
can dominate the solution when using direct methods. We studied partial pivoting, scaled 
partial pivoting, and briefly discussed complete pivoting. We recommend the partial or 
scaled partial pivoting methods for most problems because these decrease the effects of 
round-off error without adding much extra computation. Complete pivoting should be used 
if round-off error is suspected to be large. In Section 5 of Chapter 7 we will see some 
procedures for estimating this round-off error. 

Gaussian elimination with minor modifications was shown to yield a factorization 
of the matrix A into LU, where L is lower triangular with 1s on the diagonal and U is 
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upper triangular. This process is called Doolittle factorization. Not all nonsingular ma- 
trices can be factored this way, but a permutation of the rows will always give a factor- 
ization of the form PA = LU, where P is the permutation matrix used to rearrange the 
rows of A. The advantage of the factorization is that the work is significantly reduced 
when solving linear systems Ax = b with the same coefficient matrix A and different 
vectors b. 

Factorizations take a simpler form when the matrix A is positive definite. For example, 
the Choleski factorization has the form A = LL’, where L is lower triangular. A symmetric 
matrix that has an LU factorization can also be factored in the form A = LDL', where D 
is diagonal and L is lower triangular with 1s on the diagonal. With these factorizations, 
manipulations involving A can be simplified. If A is tridiagonal, the LU factorization takes 
a particularly simple form, with U having 1s on the main diagonal and Os elsewhere, except 
on the diagonal immediately above the main diagonal. In addition, L has its only nonzero 
entries on the main diagonal and one diagonal below. Another important method of matrix 
factorization is considered in Section 6 of Chapter 9. 

The direct methods are the methods of choice for most linear systems. For tridiago- 
nal, banded, and positive definite matrices, the special methods are recommended. For the 
general case, Gaussian elimination or LU factorization methods, which allow pivoting, are 
recommended. In these cases, the effects of round-off error should be monitored. In Section 
7.5 we discuss estimating errors in direct methods. 

Large linear systems with primarily O entries occurring in regular patterns can be 
solved efficiently using an iterative procedure such as those discussed in Chapter 7. Systems 
of this type arise naturally, for example, when finite-difference techniques are used to 
solve boundary-value problems, a common application in the numerical solution of partial- 
differential equations. 

It can be very difficult to solve a large linear system that has primarily nonzero entries 
or one where the 0 entries are not in a predictable pattern. The matrix associated with 
the system can be placed in secondary storage in partitioned form and portions read into 
main memory only as needed for calculation. Methods that require secondary storage can 
be either iterative or direct, but they generally require techniques from the fields of data 
structures and graph theory. The reader is referred to [BuR] and [RW] for a discussion of 
the current techniques. 

The software for matrix operations and the direct solution of linear systems imple- 
mented in IMSL and NAG is based on LAPACK, a subroutine package in the public domain. 
There is excellent documentation available with it and from the books written about it. We 
will focus on several of the subroutines that are available in all three sources. 

Accompanying LAPACK is a set of lower-level operations called Basic Linear Algebra 
Subprograms (BLAS). Level | of BLAS generally consists of vector-vector operations such 
as vector additions with input data and operation counts of O(n). Level 2 consists of the 
matrix-vector operations such as the product of a matrix and a vector with input data and 
operation counts of O(n). Level 3 consists of the matrix-matrix operations such as matrix 
products with input data and operation counts of O(n’). 

The subroutines in LAPACK for solving linear systems first factor the matrix A. The 
factorization depends on the type of matrix in the following way: 


1. General matrix PA = LU; 

2. Positive definite matrix A = LL’; 

3. Symmetric matrix A = LDL'; 

4. Tridiagonal matrix A = LU (in banded form). 


In addition, inverses and determinants can be computed. 
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Many of the subroutines in LINPACK, and its successor LAPACK, can be implemented 
using MATLAB. A nonsingular matrix A can be factored into the form PA = LU, where 
P is the permutation matrix defined by performing partial pivoting to solve a linear system 
involving A. A system of the form Ax = b is found by solving a lower triangular system 
followed by the solution to an upper triangular system. 

Other MATLAB commands include computing the inverse, transpose, and determinant 
of matrix A by issuing the commands inv(A), A’, and det(A), respectively. 

The IMSL Library includes counterparts to almost all the LAPACK subroutines and 
some extensions as well. The NAG Library has numerous subroutines for direct methods 
of solving linear systems similar to those in LAPACK and IMSL. 

Further information on the numerical solution of linear systems and matrices can be 
found in Golub and Van Loan [GV], Forsythe and Moler [FM], and Stewart [Stew1]. The 
use of direct techniques for solving large sparse systems is discussed in detail in George and 
Liu [GL] and in Pissanetzky [Pi]. Coleman and Van Loan [CV] consider the use of BLAS, 
LINPACK, and MATLAB. 
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Introduction 


Trusses are lightweight structures capable of carrying heavy loads. In bridge design, the 
individual members of the truss are connected with rotatable pin joints that permit forces to 
be transferred from one member of the truss to another. The accompanying figure shows a 
truss that is held stationary at the lower left endpoint ®, is permitted to move horizontally at 
the lower right endpoint ®, and has pin joints at ©, @, ®, and @. A load of 10,000 newtons 
(N) is placed at joint ®, and the resulting forces on the joints are given by fi, fo, f3, fa, 
and fs, as shown. When positive, these forces indicate tension on the truss elements, and 
when negative, compression. The stationary support member could have both a horizontal 
force component F and a vertical force component F>, but the movable support member 
has only a vertical force component F3. 


10,000 N 


If the truss is in static equilibrium, the forces at each joint must add to the zero vector, so 
the sum of the horizontal and vertical components at each joint must be 0. This produces the 
system of linear equations shown in the accompanying table. An 8 x 8 matrix describing this 
system has 47 zero entries and only 17 nonzero entries. Matrices with a high percentage 
of zero entries are called sparse and are often solved using iterative, rather than direct, 
techniques. The iterative solution to this system is considered in Exercise 18 of Section 7.3 
and Exercise 10 in Section 7.4. 


Joint Horizontal Component Vertical Component 
® -Fi+ 2 fit fr=0 2 fi — F, =0 

@ -2f,+8f,=0 ~~ f,- fr-ifr=0 
® =f +s, =0 fs — 10,000 = 0 
® —-Bfi— f= 0 pfs — Fs =0 
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The methods presented in Chapter 6 used direct techniques to solve a system of n x n 
linear equations of the form Ax = b. In this chapter, we present iterative methods to solve 
a system of this type. 


| Si 7.1 Norms of Vectors and Matrices 


In Chapter 2 we described iterative techniques for finding roots of equations of the form 
f (x) = 0. An initial approximation (or approximations) was found, and new approximations 
are then determined based on how well the previous approximations satisfied the equation. 
The objective is to find a way to minimize the difference between the approximations and 
the exact solution. 

To discuss iterative methods for solving linear systems, we first need to determine a 
way to measure the distance between n-dimensional column vectors. This will permit us to 
determine whether a sequence of vectors converges to a solution of the system. 

A scalar is a real (or complex) In actuality, this measure is also needed when the solution is obtained by the direct 
number generally denoted using = methods presented in Chapter 6. Those methods required a large number of arithmetic 
italic or Greek letters. Vectors are gperations, and using finite-digit arithmetic leads only to an approximation to an actual 
denoted using boldface letters. solution of the system. 


Vector Norms 


Let IR” denote the set of all n-dimensional column vectors with real-number components. 
To define a distance in R” we use the notion of a norm, which is the generalization of the 
absolute value on R, the set of real numbers. 


Definition 7.1 A vector norm on R” is a function, || - ||, from R” into R with the following properties: 
(i) ||x|| => 0 for allx € R’, 
(ii) ||x|| = O if and only if x = 0, 
(iii) ||ax|| = |@|||x|| for alla € R andx € R’, 
(iv) [lx+yll < IIxll +lly|l for all x, y € R”. a 


Vectors in IR” are column vectors, and it is convenient to use the transpose notation 
presented in Section 6.3 when a vector is represented in terms of its components. For 
example, the vector 


will be written x = (x1,%2,...,Xn)!. 
We will need only two specific norms on R”, although a third norm on R” is presented 
in Exercise 2. 


Definition 7.2 The J; and /,, norms for the vector x = (x1,X2,...,Xn)' are defined by 


n 1/2 
2 
isk = {>| and [[X|loo = max ||. a 
l<i<n 


i=l 
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Note that each of these norms reduces to the absolute value in the case n = 1. 

The /5 norm is called the Euclidean norm of the vector x because it represents the 
usual notion of distance from the origin in case x is in R! = R, R’, or R?. For example, the 
Ib norm of the vector x = (x1, x2,x3)' gives the length of the straight line joining the points 
(0,0,0) and (x, x2,.x3). Figure 7.1 shows the boundary of those vectors in R? and R? that 
have /, norm less than 1. Figure 7.2 is a similar illustration for the /,, norm. 


Figure 7.1 


x ; 
24 The vectors in the 


first octant of R3 
with /, norm less 


The vectors in R? 
with /, norm less 


than 1 are inside than | are inside 


this figure. this figure. 


(0, 1, 0) 


Xo 


Figure 7.2 


(0, 1) (1, 1) 


Gaal 


(0, —1) di, -1) (1, 1, 0) 


The vectors in R? with The vectors in the first 
/,, norm less than | are octant of R? with /,, norm 
inside this figure. less than | are inside 

this figure. 
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Example 1 


Theorem 7.3 


There are many forms of this 
inequality, hence many 
discoverers. Augustin Louis 
Cauchy (1789-1857) describes 
this inequality in 1821 in Cours 
d’Analyse Algébrique, the first 
rigorous calculus book. An 
integral form of the equality 
appears in the work of Viktor 
Yakovlevich Bunyakovsky 
(1804-1889) in 1859, and 
Hermann Amandus Schwarz 
(1843-1921) used a double 
integral form of this inequality in 
1885. More details on the history 
can be found in [Stee]. 


Iterative Techniques in Matrix Algebra 


Determine the /) norm and the /,, norm of the vector x = (—1, 1, —2)’. 


Solution The vector x = (—1,1,—2)' in R? has norms 


Ixll2 = V(—1D? + (1)? + (-2)? = V6 
and 


IIXlloo = max{| — 1], |1],| — 2|} = 2. a 


It is easy to show that the properties in Definition 7.1 hold for the /,. norm because 
they follow from similar results for absolute values. The only property that requires much 


demonstration is (iv), and in this case if x = (x1,%2,...,X,)' and y = (1, yo,..., Yn)’, then 
IX + Ylloo = max |x; + y;| < max (|x| + |yi]) < max |x;| + max |y;| = ||Xlloo + ll¥lloo- 
l<i<n lSi<n 1<i<n l<i<n 


The first three conditions also are easy to show for the /, norm. But to show that 
IIx + yll2 < [Ixll2+ llyll2, foreachx,y € R,, 


we need a famous inequality. 


(Cauchy-Bunyakovsky-Schwarz Inequality for Sums) 
* iXn) and y = (1,2; be Yn)! in R’, 


n n n 
t 2 2 
xy=) XiYi S Y Xj y Yi 
i=1 i=l i=1 


For each x = (x1, X2,.. 


1/2 1/2 


= [[xll2- llyll2. (7.1) 


Proof If y = 0 or x = 0, the result is immediate because both sides of the inequality are 
Zero. 
Suppose y # 0 and x $ 0. Note that for each A € R we have 


n n n n 
0 <|Ix—Ayl]} = ) 0G; — Ay)? = Dox? — 24D my, +? D7, 
i=1 i=1 i=1 


i=1 


so that 
n n n 
20 xvi < Yap +? Soy? = [x3 + Wily. 
i=1 i=1 i=1 


However ||x||2 > 0 and |ly||2 > 0, so we can let A = ||x||2/|ly|l2 to give 


Ixllo\ (x IxIl3 
2 2 2 2 2: 
2 | xvi) S UXIG + — Sllylld = 2 Ix. 
Gr d o 2 tly? : 
Hence 
: ly 
2 
2) xi < Ere = 2\Ixllallyllo. 
i=1 
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and 


n n 1/2 n 1/2 
x'y = oxy < lll = | a {os i 


i=1 i=1 i=1 


With this result we see that for each x, y € R”, 
IIx +yll3 = )o@i+y)? = D097 +20 mys + Dy? < Iexll3 + 2ilxllallylle + lly. 
i=l i=l i=l i=l 
which gives norm property (iv): 


ye 


IIx + yllo < (Ixll3 + 2llxllollyll2 + llyllz) = [xl + llyll2. 


Distance between Vectors in R" 


The norm of a vector gives a measure for the distance between an arbitrary vector and 
the zero vector, just as the absolute value of a real number describes its distance from 0. 
Similarly, the distance between two vectors is defined as the norm of the difference of the 
vectors just as distance between two real numbers is the absolute value of their difference. 


Definition 7.4 If x = (41,%2,...,X,)' and y = ()1, y2,..., Yn)’ are vectors in R”, the J, and /,, distances 
between x and y are defined by 


n 1/2 
IIx—yllo = {doi—»"| and |x — ylloo = max |x; — yil. a 
i=l 1l<i<n 


Example 2 The linear system 
3.3330x, + 15920x2 — 10.333x3 = 15913, 
2.2220x, + 16.710x2 + 9.6120x3 = 28.544, 
1.561 1x; +5.1791x2 + 1.6852x3 = 8.4254 


has the exact solution x = (x1,%2,x3)' = (1,1, 1)‘, and Gaussian elimination performed 
using five-digit rounding arithmetic and partial pivoting (Algorithm 6.2), produces the 
approximate solution 


K = (X1,%0,%3)' = (1.2001, 0.99991, 0.92538)’. 
Determine the /, and /,, distances between the exact and approximate solutions. 
Solution Measurements of x — x are given by 

|X — Xlloo = max{|1 — 1.2001], |1 — 0.99991], |1 — 0.92538]} 
= max{0.2001, 0.00009, 0.07462} = 0.2001 
and 
IIx — XIl2 = [1 — 1.2001)? + (1 — 0.99991)? + (1 — 0.92538] '/” 
= [(0.2001)” + (0.00009)* + (0.07462)7]!/? = 0.21356. 


Although the components x2 and x3 are good approximations to x2 and x3, the component 
xX, 1s a poor approximation to x;, and |x; — x,| dominates both norms. o 
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Definition 7.5 


Theorem 7.6 


Example 3 


Theorem 7.7 


Iterative Techniques in Matrix Algebra 


The concept of distance in R” is also used to define a limit of a sequence of vectors in 
this space. 


A sequence {x }e2, of vectors in R” is said to converge to x with respect to the norm || - || 
if, given any ¢ > O, there exists an integer N(¢) such that 


IIx = x|| <&, for all k = N¢(e). . 


The sequence of vectors {x} converges to x in R” with respect to the /,, norm if and only 
if img. x! = x;, for each i = 1,2,...,n. a 


Proof Suppose {x} converges to x with respect to the /,, norm. Given any ¢ > 0, there 
exists an integer V(¢) such that for all k > N(e), 


max be — x;| = ||k® —xlloo <e. 


i=1,2,....n 


This result implies that jx — x;| < €, for each i = 1,2,...,n, so limp 56g.” = x; for 
each 1. 
Conversely, suppose that limz_, oo x? = x;, foreveryi = 1,2,...,n.Foragivene > 0, 
let N;(€) for each i represent an integer with the property that 
[x —xi| <6, 
whenever k > N;(e). 
Define N(e) = maxj=1,...n Ni(e). If k > N(e), then 


max |x — x;| = ||x® —xllo <e. 
i=1,2,....n 
This implies that {x} converges to x with respect to the /,. norm. oo 8 


Show that 


1 3 oo i 
x = Cou ae ae id = (1.2 + i me sin) ; 


converges to x = (1, 2,0, 0)! with respect to the J. norm. 


Solution Because 


lim1l=1, lim(2+1/k)=2, lim 3/k7=0 and lim e* sink =0, 
k>oo k>oo 


k-oo k->0o 


Theorem 7.6 implies that the sequence {x} converges to (1, 2, 0, 0)' with respect to the 
log Norm. a 


To show directly that the sequence in Example 3 converges to (1, 2,0, 0)‘ with respect 
to the / norm is quite complicated. It is better to prove the next result and apply it to this 
special case. 


For each x € R", 


IIXlloo < IXll2 < Vi||Xlloo. fa 
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Proof Let x; be a coordinate of x such that ||X||oo = maxj<j<p |x;| = |x;|. Then 


n 
2 2 2 2 2 
x2, = lal? = 27 < Dx? = xl, 
i=1 


and 
[IXlloo < [Ixll2- 

So 

n n 

2 2 2 2 2 
Ixll3 = doa? < Soap = nx? = nl [x[2, 

=! i=1 

and IIx|l2 < JX |loo- = 8 @ 


Figure 7.3 illustrates this result when n = 2. 


Figure 7.3 


Example 4 In Example 3, we found that the sequence {x}, defined by 


x = 24: es e «sink 
? kk? b | 


converges to x = (1,2,0,0)' with respect to the J, norm. Show that this sequence also 
converges to x with respect to the J) norm. 


Solution Given any ¢ > 0, there exists an integer N(e/2) with the property that 


k 
Ix® —xlloo <5, 


whenever k > N(¢/2). By Theorem 7.7, this implies that 


|x —x|lo < V4I|x® — xlloo < 2(¢/2) =, 


when k > N(e¢/2). So {x} also converges to x with respect to the / norm. a 
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It can be shown that all norms on R” are equivalent with respect to convergence; that 
is, if || - || and || - ||/ are any two norms on R” and xj, has the limit x with respect to 
|| - ||, then {xy also has the limit x with respect to || - ||’ . The proof of this fact for the 
general case can be found in [Or2], p. 8. The case for the J and /,, norms follows from 
Theorem 7.7. 


Matrix Norms and Distances 


In the subsequent sections of this and later chapters, we will need methods for determining 
the distance between n x n matrices. This again requires the use of a norm. 


Definition 7.8. A matrix norm on the set of all n x n matrices is a real-valued function, || - ||, defined on 
this set, satisfying for all n x n matrices A and B and all real numbers a: 
@) |All 2 0; 


(ii) ||A|| = 0, if and only if A is O, the matrix with all 0 entries; 

(iii) ||@A] = || |All; 

(iv) ||A+ Bll < ||All + |||; 

(v) ||ABl| < ||AIIBI. > 


The distance between n x n matrices A and B with respect to this matrix norm is 
|A — BI. 

Although matrix norms can be obtained in various ways, the norms considered most 
frequently are those that are natural consequences of the vector norms Jy and Ig. 

These norms are defined using the following theorem, whose proof is considered in 
Exercise 13. 


Theorem 7.9 If ||- || is a vector norm on R", then 
|A|| = max ||Ax|| (7.2) 
Ixl]=1 
is a matrix norm. | 
Every vector norm produces an Matrix norms defined by vector norms are called the natural, or induced, matrix norm 
associated natural matrix norm. associated with the vector norm. In this text, all matrix norms will be assumed to be natural 


matrix norms unless specified otherwise. 
For any z 4 0, the vector x = z/||z|| is a unit vector. Hence 


(aay) oe 
—— }|| = max . 
I|Z\] 740 ||Z\l 


A 
(Ajtaae (73) 
me fal 


max ||Ax|| = max 
iIx|=1 740 


and we can alternatively write 


The following corollary to Theorem 7.9 follows from this representation of ||A||. 


Corollary 7.10 For any vector z 4 0, matrix A, and any natural norm || - ||, we have 


||Az|| < ||Al] - llz\). 7 
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The measure given to a matrix under a natural norm describes how the matrix stretches 
unit vectors relative to that norm. The maximum stretch is the norm of the matrix. The 
matrix norms we will consider have the forms 


|Alloo = max ||Ax|lo, the. norm, 


IXlloo=1 


and 
Allo = max, |Ax||2, the norm. 
x|2= 


An illustration of these norms when n = 2 is shown in Figures 7.4 and 7.5 for the 


a=[3 ol 


matrix 


Figure 7.4 


Figure 7.5 
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The /,, norm of a matrix can be easily computed from the entries of the matrix. 


Theorem 7.11 If A = (aj;) is ann Xx n matrix, then 
n 
Alloo = max il. | 
Allo = max d lais| 
j= 


n 
Proof First we show that ||A||,., < max > lai jl. 
l<i<n 
sisn 
Let x be an n-dimensional vector with 1 = ||x||oo = max;<j<, |x;|. Since Ax is also an 
n-dimensional vector, 
n n 
||AX|loo = max |(Ax);| = max So air; < max a |ai;| amas |x;|. 
3 Sjgn 
j=l 


l<i<n l<i<n l<i<n* i 
J= 


But max; < j<n |xj| = ||Xlloo = 1, so 
n 
|AX|lo0 < max )~ |ajjl, 
1l<i<n 
j=l 
and consequently, 
n 
|Alloo = max ||AX||o < max }° aij. (7.4) 
Ixlloo=1 ISisn 
j= 


Now we will show the opposite inequality. Let p be an integer with 


n n 
dpj| = max dij 
> las = max Yash 
j=l j=l 
and x be the vector with components 


i, ak a= 0, 


i= 1 % 
—l, if ay <0. 


Then ||X||oo = 1 and ap;x; = |ap;|, for all j = 1,2,...,n, so 


n 


n 
AX||oo = max aes > sa Xj; | = > pj = max ) > dj j\- 
Nast = ax |) aes] [anes] = [lal] = yoax 9 lal 
j= 


j=l 


This result implies that 


n 

|Alloo = max ||AX||oo 2 max y |aijl- 
X|loo=1 l<i<n ¢* i 
j= 


n 


Putting this together with Inequality (7.4) gives ||A||,, = max > |a; jl. a | 
l<i<n 
sisn < 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Example 5 


7.1. Norms of Vectors and Matrices 


Determine ||A||.. for the matrix 


1 2 -l 
A=]| 0 3 -l 
5 -l 1 


Solution We have 


3 3 
Yo layl =F 12+ 1-1 =4, So lal = 10] + 13141 - 1 = 4, 


j=l j=l 
and 


3, 
Y- lagl = 15] + | — 1] + [1] = 7. 


j=! 


So Theorem 7.11 implies that ||A||.. = max{4, 4, 7} = 7. 


441 


In the next section, we will discover an alternative method for finding the /, norm of a 


matrix. 


EXERCISE SET 7.1 


1. 
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Find /,, and /, norms of the vectors. 

a. x= (3,—4,0, 3)! 

x = (2,1, -3,4)' 

x = (sink, cosk, 2*)! for a fixed positive integer k 

x = (4/(k + 1), 2/k*, ke)’ for a fixed positive integer k 
Verify that the function || - ||1, defined on R” by 


n 
Iixlli = 0 bail, 
i=1 


pRo 


is anorm on R’. 
b. Find ||x||,; for the vectors given in Exercise 1. 
c. Prove that for all x € R", ||x||; > |[xll2- 
Prove that the following sequences are convergent, and find their limits. 
a. x = (1/k,e!-*, —2/k*)! 
b. x = (e*cosk,ksin(1/k),3 +k)’ 
c. x® = (ke, (cosk)/k, Jk +k — ky! 
da. x® = (e!/E (2 4:1)/ —k?), I/P) +3 454+---4+ (2k —1)))! 


Find the /,, norm of the matrices. 


10 15 b 10 0 

0 1 15 1 
2 -l 0 4 -1 7 
c —l 2 —l d. -1 4 0 
0 -!il 2 -7 0 4 
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5. The following linear systems Ax = b have x as the actual solution and x as an approximate solution. 
Compute ||x — X|]o0 and ||AX — bll.o. 


a 5utim=d, b x1 +2433 =1, 
in tin= 4, 2x1 + 3x2 + 443 = —1, 
x= (4,-4);, 3x, + 4x. + 6x3 = 2, 
X = (0.142, —0.166)'. x = (0,—-7,5)', 

x = (—0.33, —7.9, 5.8)’. 

c. Xx, + 2x. + 3x3 = 1, d.  0.04x, + 0.01x. — 0.01x3 = 0.06, 
2x, + 3x. + 4x3 = -l, 0.2x, + 0.5%. — 0.2x3 = 0.3, 
3x) + 4x2 + 6x3 = 2, y+ wt 43=11, 
x = (0, -7,5)', x = (1.827586, 0.6551724, 1.965517)’, 
x = (—0.2, —7.5, 5.4)’. K = (1.8, 0.64, 1.9)’. 

6. The matrix norm || - ||;, defined by ||A||; = ae ||Ax||;, can be computed using the formula 


n 
Alli = max )_ |aij|, 
J 
l<j<n* : 
i= 


where the vector norm || - ||; is defined in Exercise 2. Find || - ||; for the matrices in Exercise 4. 


7. Show by example that || - ||@, defined by ||Allo = max |a;;|, does not define a matrix norm. 
<i,j<n 


8. Show that || - ||o, defined by 


n 


Allo = >_> lal, 


i=l j=l 


is a matrix norm. Find || - || for the matrices in Exercise 4. 


9. a. The Frobenius norm (which is not a natural norm) is defined for an n x n matrix A by 


1/2 
n n 


WAlle = | >> > lai? 


i=l j=l 


Show that || - ||- is a matrix norm. 
b. Find || - || for the matrices in Exercise 4. 
c. For any matrix A, show that ||All2 < ||All- < 2! ||Alfo. 


10. In Exercise 9 the Frobenius norm of a matrix was defined. Show that for any n x n matrix A and vector 
xin R", ||Ax|l2 < llAlle|Ixll2. 
11. Let S be a positive definite n x n matrix. For any x in R” define ||x|| = (x'Sx)!/*. Show that this 


defines a norm on R”. [Hint: Use the Cholesky factorization of S to show that x’‘Sy = y’Sx < 
(x'Sx)!/(y'Sy)'/*.] 


12. Let S be areal and nonsingular matrix, and let || - || be any norm on R”. Define || - ||’ by ||x||/ = || Sxl]. 
Show that || - || is also a norm on R’. 
13. Prove that if || - || is a vector norm on R", then ||A|| = maxyxj=1 ||Ax|| is a matrix norm. 


14. The following excerpt from the Mathematics Magazine [Sz] gives an alternative way to prove the 
Cauchy-Buniakowsky-Schwarz Inequality. 


a. Show that when x 4 0 and y # 0, we have 


i) 


erty 1 1 3 Xi Yi 
n 1/2 n 1/2 1/2 1/2 
(Shia?) (Diy?) a(S) (Dh?) 
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b. Use the result in part (a) to show that 


n n 1/2 n 1/2 
E(B)" 
i=1 i=1 


i=1 


15. Show that the Cauchy-Buniakowsky-Schwarz Inequality can be strengthened to 


n n n 1/2 n 1/2 
ee (>») (>) | 
i=1 i=1 i=1 i=1 


| 7.2 Eigenvalues and Eigenvectors 


Ann X m matrix can be considered as a function that uses matrix multiplication to take 
m-dimensional column vectors into n-dimensional column vectors. So ann x m matrix is 
actually a linear function from R” to R”. A square matrix A takes the set of n-dimensional 
vectors into itself, which gives a linear function from R” to R”. In this case, certain nonzero 
vectors x might be parallel to Ax, which means that a constant A exists with Ax = Ax. For 
these vectors, we have (A — AJ)x = 0. There is a close connection between these numbers i 
and the likelihood that an iterative method will converge. We will consider this connection 
in this section. 


Definition 7.12 If A is a square matrix, the characteristic polynomial of A is defined by 


p(a) = det(A — Al). @ 


It is not difficult to show (see Exercise 13) that p is an nth-degree polynomial and, 
consequently, has at most n distinct zeros, some of which might be complex. If A is a zero 
of p, then, since det(A — AJ) = 0, Theorem 6.17 on page 398 implies that the linear system 
defined by (A — AJ)x = 0 has a solution with x 4 0. We wish to study the zeros of p and 
the nonzero solutions corresponding to these systems. 


Definition 7.13 If p is the characteristic polynomial of the matrix A, the zeros of p are eigenvalues, 
or characteristic values, of the matrix A. If 4 is an eigenvalue of A and x ¥ 0 satisfies 
(A — AI)x = 0, then x is an eigenvector, or characteristic vector, of A corresponding to 


The prefix eigen comes from the the eigenvalue 4. La 
German adjective meaning “to 

own”, and is synonymous in To determine the eigenvalues of a matrix, we can use the fact that 

English with the word 

characteristic. Bach matrix has e is an eigenvalue of A if and only if det(A — AI) = 0. 


its own eigen- or characteristic 


tion, with di : ‘ ; ; : 
ee ee Once an eigenvalue 4 has been found a corresponding eigenvector x 4 0 is determined by 


eigen- or characteristic values : 
solving the system 


and functions. 


e (A—ADx = 0. 


Example 1 Show that there are no nonzero vectors x in R? with Ax parallel to x if 
0 1 
‘17, 
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Solution The eigenvalues of A are the solutions to the characteristic polynomial 


0 = det(a ~ a1) = det | a = =F 41, 
so the eigenvalues of A are the complex numbers 4; = i and Ay = —i. A corresponding 


eigenvector x for A, needs to satisfy 


Hebe eee 


that is, 0 = —ix; + x2, $0 x2 = ix;, and 0 = —x, — ixo. Hence if x is an eigenvector of 
A, then exactly one of its components is real and the other is complex. As a consequence, 
there are no nonzero vectors x in R with Ax parallel to x. a 


If x is an eigenvector associated with the real eigenvalue 2, then Ax = Ax, so the matrix 
A takes the vector x into a scalar multiple of itself. 


e IfA is real and A > 1, then A has the effect of stretching x by a factor of A, as illustrated 
in Figure 7.6(a). 


e If0 <A < 1, thenA shrinks x by a factor of A (see Figure 7.6(b)). 


e Ifd <0, the effects are similar (see Figure 7.6(c) and (d)), although the direction of Ax 
is reversed. 


Figure 7.6 


(a) \>1 (b) 1>A>0 (c) \<-l (d) -1<\A <0 
Ax 


x 
Ax x - 
Ax 
Ax 


Ax = \X 


Notice also that if x is an eigenvector of A associated with the eigenvalue 1 and a is 
any nonzero constant, then ax is also an eigenvector since 


A(a@x) = a@(Ax) = a(Ax) = A(ax). 


An important consequence of this is that for any vector norm || - || we could choose the 
constant a = +||x||~!, which would result in wx being an eigenvector with norm 1. So 


e For every eigenvalue and any vector norm there are eigenvectors with norm 1. 


Example 2 Determine the eigenvalues and eigenvectors for the matrix 


2 0 0 
A=] 1 1 2 
1 -l 4 
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Solution The characteristic polynomial of A is 


4, 0 0 
p(a) = det(A — Al) = det i tok 2 
¢ <4 425 


Sa aF Pig a a 


so there are two eigenvalues of A: A; = 3 and A, = 2. 
An eigenvector x; corresponding to the eigenvalue 4; = 3 is a solution to the vector- 
matrix equation (A — 3 - /)x; = 0, so 


0 -l1 0 0 x1 
0 = 1 - 2 X2 7 
0 1 1 1 x3 


which implies that x, = 0 and x2 = x3. 

Any nonzero value of x3 produces an eigenvector for the eigenvalue 4; = 3. For 
example, when x3 = | we have the eigenvector x; = (0,1, 1)’, and any eigenvector of A 
corresponding to A = 3 is a nonzero multiple of x;. 

An eigenvector x ~ 0 of A associated with A2 = 2 is a solution of the system 
(A —2-I)x = 0, so 


0 0 0 0 xy 
0 — 1 -—1 2 X2 
0 1 -1 2 x3 


In this case the eigenvector has only to satisfy the equation 
Xx, — X + 2x3 = 0, 


which can be done in various ways. For example, when x; = 0 we have x. = 2x3, so 
one choice would be x. = (0,2, 1)’. We could also choose x. = 0, which requires that 
x, = —2x3. Hence x3 = (—2,0, 1)‘ gives a second eigenvector for the eigenvalue A = 2 
that is not a multiple of x2. The eigenvectors of A corresponding to the eigenvalue 42 = 2 
generate an entire plane. This plane is described by all vectors of the form 


aX) oP Bx3 = (—2B, 2a, a ar By. 


for arbitrary constants a and £, provided that at least one of the constants is nonzero. & 


The package LinearAlgebra in Maple provides the function Eigenvalues to compute 
eigenvalues. The function Eigenvectors gives both the eigenvalues and the corresponding 
eigenvectors of a matrix. To produce results for the matrix in Example 2, we first load the 
package with 


with(LinearAlgebra) 
Then we enter the matrix 
A := ({[2,0, 0], [1, 1,2], 1, -1, 41) 


giving 
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Definition 7.14 


Theorem 7.15 


Iterative Techniques in Matrix Algebra 


To determine the eigenvalues and eigenvectors we use 
evalf(Eigenvectors(A)) 


which returns 


3 9 =2. 1 
2 lg: |, of 0 1 
2 1 1 0 


implying that the eigenvalues are 3, 2, and 2 with corresponding eigenvectors given by the 
respective columns as (0, 1, 1)’, (—2,0, 1)’, and (1, 1, 0)’. 

The LinearAlgebra package also contains the command CharacteristicPolynomial, so 
the eigenvalues could also be obtained with 


p := CharacteristicPolynomial(A, i); factor(p) 
This gives 
—12 + 43-7)? + 16d 
(—3)A—-2) 


The notions of eigenvalues and eigenvectors are introduced here for a specific compu- 
tational convenience, but these concepts arise frequently in the study of physical systems. In 
fact, they are of sufficient interest that Chapter 9 is devoted to their numerical approximation. 


Spectral Radius 
The spectral radius o(A) of a matrix A is defined by 
p(A) = max|A|, where A is an eigenvalue of A. 


(For complex 4 = a + fi, we define |A| = (a? + B7)!/.) | 


For the matrix considered in Example 2, 0(A) = max{2, 3} = 3. 
The spectral radius is closely related to the norm of a matrix, as shown in the following 
theorem. 


If A is ann x n matrix, then 
(i) ||All2 = [p(4‘A)]"”, 


(ii) (A) < |All, for any natural norm || - ||. a 


Proof The proof of part (i) requires more information concerning eigenvalues than we 
presently have available. For the details involved in the proof, see [Or2], p. 21. 

To prove part (ii), suppose A is an eigenvalue of A with eigenvector x and ||x|| = 1. 
Then Ax = Ax and 


|A| = |A] - |[xl] = Axl] = Axi] < IAT IExl] = IA. 
Thus 


p(A) = max |A| < |IAl]. 
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Part (i) of Theorem 7.15 implies that if A is symmetric, then ||All2 = p(A) (see 
Exercise 14). 

An interesting and useful result, which is similar to part (ii) of Theorem 7.15, is that 
for any matrix A and any ¢ > 0, there exists a natural norm || - || with the property that 
p(A) < ||A|| < e(A) + €. Consequently, p(A) is the greatest lower bound for the natural 
norms on A. The proof of this result can be found in [Or2], p. 23. 


Example 3 Determine the /, norm of 


Solution To apply Theorem 7.15 we need to calculate o(A‘A), so we first need the eigen- 
values of A‘A. 


1 1 -!l 1 1 0 3.92 1 
AA=] 1 2 1 12 7 )/= 2 6 4 
0 1 -1 1 2 -1 4 5 
If 
3-2 2 -1 


O=det(4’A—AN =det] 2 6-A 4 
of 4 $29 


Sap + 1 = 40S HS 14 42), 


thend} =OorA =74 V7. By Theorem 7.15 we have 


(Allo = «/p(AtA) = /max{0,7 —J/7,7T+ V7} = 7 + V7 © 3.106. a 


The operations in Example 3 can also be performed using the LinearAlgebra package 
in Maple by first loading the package and then entering the matrix. 


with(LinearAlgebra): A := Matrix({[1, 1,0], [1, 2, 1], [-1, 1, 2])) 


Maple will respond by showing the matrix that was entered. To determine the transpose of 
A we use 


B := Transpose(A) 


which gives 


1 1 -l 
1 2 1 
Ov fl 2 
Then we can compute the product AB with 
C:=A.B 
which produces 
3 2 -1 
2 6 4 
—-1 4 5 
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The command 
evalf(Eigenvalues(C)) 


gives the vector 


0. 
9.645751311 
4.354248689 


Since ||Al|2 = /e(A‘A) = /e(C), we have 
[|Al|2 = V9.645751311 = 3.105760987, 


which we could also find with evalf(Norm(A, 2)). 

To determine the /,, norm of A, replace the last command with evalf(Norm(A, infinity)) 
which Maple gives as 4. This is seen to be correct because it is the sum of the magnitude of 
the entries in the second row. 


Convergent Matrices 


In studying iterative matrix techniques, it is of particular importance to know when powers 
of a matrix become small (that is, when all the entries approach zero). Matrices of this type 
are called convergent. 


Definition 7.16 Wecall ann x n matrix A convergent if 


Jim (4*);; = 0, for eachi = 1,2,...,n andj = 1,2,...,n. | 
> CO 
Example 4 Show that 
1 
5 O 
_| 2 
— Io 
4 2 


is a convergent matrix. 


Solution Computing powers of A, we obtain: 


1 1 1 
So z.|| a ee 4@a,| 16°” 
“~] bol yy? ~] 3 Ly? ~ i J. |? 
4 4 16 8 8 16 
and, in general, 
lyk 
| OF 0 


k 1\k 
Qk+1 (5) 


So A is a convergent matrix because 


li D3 d li e =0 | 
cea) ee gee oer 


Notice that the convergent matrix A in Example 4 has p(A) = 5. because 5 is the only 
eigenvalue of A. This illustrates an important connection that exists between the spectral 
radius of a matrix and the convergence of the matrix, as detailed in the following result. 
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Theorem 7.17 The following statements are equivalent. 


(i) Ais aconvergent matrix. 

(ii) Limy-+00 ||A” || = 0, for some natural norm. 
(iii) Limy_o0 ||A” || = 0, for all natural norms. 
(iv) p(A) <1. 


(v) lim, A”x = 0, for every x. | 


The proof of this theorem can be found in [IK], p. 14. 


EXERCISE SET 7.2 


1. 


13. 


14. 


Compute the eigenvalues and associated eigenvectors of the following matrices. 


2 -1 o 4 0 $ 
Soler wl Sala «(ft 6] 
2 1 0 -1 2 0 2 1 1 
d. 1 2 0 e. 0 3 4 f. 2 3 2 
0 0 3 0 0 7 1 1 2 
Compute the eigenvalues and associated eigenvectors of the following matrices. 
ae | oar bf tot] e lial 
—2 -2 3 3 1 0 
3. 2 -1 >; 0 0 2 -1 0 
d 1-2 3 e. -1 ; 0 f. 0 2 4 
2 0 4 2 2 -} 0 0 2 


Find the complex eigenvalues and associated eigenvectors for the following matrices. 


mee mee 


Find the complex eigenvalues and associated eigenvectors for the following matrices. 


1 0 2 Oo 1 -2 
a. 0 1 -i1 b. 1 0 0 
-1 1 1 1 1 1 


Find the spectral radius for each matrix in Exercise 1. 
Find the spectral radius for each matrix in Exercise 2. 
Which of the matrices in Exercise | are convergent? 
Which of the matrices in Exercise 2 are convergent? 
Find the /, norm for the matrices in Exercise 1. 

Find the /, norm for the matrices in Exercise 2. 


a 
Let A; = | i q and A, = | 16 I Show that A, is not convergent, but A, is convergent. 


4 2 2 


Ann Xx n matrix A is called nilpotent if an integer m exists with A” = O,. Show that if A is an 
eigenvalue of a nilpotent matrix, then 2 = 0. 


Show that the characteristic polynomial p(A) = det(A — AJ) for the n x n matrix A is an nth-degree 
polynomial. [Hint: Expand det(A — iJ) along the first row, and use mathematical induction on n.] 


a. Show that if A is ann x n matrix, then 


detA = Il Kis 
i=1 


where A;,..., A, are the eigenvalues of A. [Hint: Consider p(0).] 
b. Show that A is singular if and only if A = 0 is an eigenvalue of A. 
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15. Let A be an eigenvalue of the n x n matrix A and x ¥ 0 be an associated eigenvector. 


a. Show that A is also an eigenvalue of A’. 


b. Show that for any integer k > 1, A* is an eigenvalue of A‘ with eigenvector x. 

c. Show that if A~! exists, then 1/A is an eigenvalue of A~! with eigenvector x. 

d. Generalize parts (b) and (c) to (A~!)* for integers k > 2. 

e. Given the polynomial g(x) = qo + qix +--: + qx*, define g(A) to be the matrix g(A) = 
gol +qA+---+q,A*. Show that q(A) is an eigenvalue of q(A) with eigenvector x. 

f. Leta # d be given. Show that if A — a/ is nonsingular, then 1/(A — @) is an eigenvalue of 


(A — a)~! with eigenvector x. 
16. Show that if A is symmetric, then ||A||, = p(A). 


17. In Exercise 15 of Section 6.3, we assumed that the contribution a female beetle of a certain type made 
to the future years’ beetle population could be expressed in terms of the matrix 


0 0 6 
A=| 3 0 0], 
0 4 0 


where the entry in the ith row and jth column represents the probabilistic contribution of a beetle of 

age j onto the next year’s female population of age i. 

a. Does the matrix A have any real eigenvalues? If so, determine them and any associated eigen- 
vectors. 

b. Ifa sample of this species was needed for laboratory test purposes that would have a constant 
proportion in each age group from year to year, what criteria could be imposed on the initial 
population to ensure that this requirement would be satisfied? 

18. Find matrices A and B for which p(A + B) > p(A) + p(B). (This shows that o(A) cannot be a matrix 
norm.) 


19. Show that if || - || is any natural norm, then (||[A~'||)~! < |A| < ||Al| for any eigenvalue 1 of the 
nonsingular matrix A. 


| a 7.3. The Jacobi and Gauss-Siedel Iterative Techniques 


In this section we describe the Jacobi and the Gauss-Seidel iterative methods, classic 
methods that date to the late eighteenth century. Iterative techniques are seldom used for 
solving linear systems of small dimension since the time required for sufficient accuracy 
exceeds that required for direct techniques such as Gaussian elimination. For large sys- 
tems with a high percentage of 0 entries, however, these techniques are efficient in terms 
of both computer storage and computation. Systems of this type arise frequently in circuit 
analysis and in the numerical solution of boundary-value problems and partial-differential 
equations. 

An iterative technique to solve the n x n linear system Ax = b starts with an initial 
approximation x to the solution x and generates a sequence of vectors {x“)}® , that 
converges to x. 


Jacobi’s Method 


The Jacobi iterative method is obtained by solving the ith equation in Ax = b for x; to 
obtain (provided aj; 4 0) 
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Example 1 


Carl Gustav Jacob Jacobi 
(1804-1851) was initially 
recognized for his work in the 
area of number theory and elliptic 
functions, but his mathematical 
interests and abilities were very 
broad. He had a strong 
personality that was influential in 
establishing a research-oriented 
attitude that became the nucleus 
of a revival of mathematics at 
German universities in the 19th 
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For each k > 1, generate the components x“ of x from the components of x*~) by 


fori = 1,2,...,n. (7.5) 


The linear system Ax = b given by 


Ey: 10x,;- x2+ 2x3 = 6, 
Ey: —x,+11lx.- x34 3x4 = 25, 
E3: 2x1 -— x2. +10x3- x4 =-11, 
E4: 3x. -— =x3 + 8x4 = 15 


has the unique solution x = (1,2, —1, 1)’. Use Jacobi’s iterative technique to find approxi- 
mations x™ to x starting with x = (0,0, 0,0)! until 
(esl cilia Fe 


< 1073. 
1x lloo 


century. Solution We first solve equation E£; for x;, for each i = 1,2,3,4, to obtain 
1 1 rn 3 
xy= =X. - =x =, 
. i |S 5 
1 4 1 3 a 25 
y= Xx x x ; 
i i a 
1 in 1 i 1 11 
x3 = —- 2x, + —x —xX. —, 
oe A ig. 10 
3 es 1 15 
x4 = — =x =x — 
: go ae 8 
From the initial approximation x© = (0,0, 0,0) we have x“ given by 
x) = 1,0_ lo + == 0.6000 
1 10 2 5 3 ? 
1 1 3 25 
(1) (0) (0) (0) 
MH = = +—x xX, + 2.2727, 
2 ii * we oiu4* 7 i 
1 1 1 11 
(1) (0) (0) (0) 
xX, =x — — — — = —1,.1000, 
5 6 * 107 104% 10 
3 1 15 
(1) (0) (0) 
a =X,° + =x +—= 1.8750. 
: 8? 83 8 
Additional iterates, x“ = oa oe )', are generated in a similar manner and are 
presented in Table 7.1. 
Table 7.1 
k 0 1 2 3 4 5 6 7 8 9 10 
a 0.0000 0.6000 1.0473 0.9326 1.0152 0.9890 1.0032 0.9981 1.0006 0.9997 1.0001 
a 0.0000 2.2727 1.7159 2.053 1.9537 2.0114 1.9922 2.0023 1.9987 2.0004 1.9998 
roe 0.0000  —1.1000  —0.8052 —1.0493 —0.9681  -—1.0103 —0.9945  —1.0020 —0.9990 —1.0004 —0.9998 
Pa 0.0000 1.8750 0.8852 1.1309 0.9739 1.0214 0.9944 1.0036 0.9989 1.0006 0.9998 
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We stopped after ten iterations because 


x0 —x [oo _ 8.0 x 10~ 
xP, 1.9998 


<10-*, 
In fact, |x“ — x|],, = 0.0002. a 


In general, iterative techniques for solving linear systems involve a process that converts 
the system Ax = b into an equivalent system of the form x = Tx + ¢c for some fixed matrix 
T and vector ec. After the initial vector x is selected, the sequence of approximate solution 
vectors is generated by computing 


x = Tx® Dig 


for each k = 1,2,3,.... This should be reminiscent of the fixed-point iteration studied in 
Chapter 2. 

The Jacobi method can be written in the form x = Tx“—! + ¢ by splitting A into its 
diagonal and off-diagonal parts. To see this, let D be the diagonal matrix whose diagonal 
entries are those of A, —L be the strictly lower-triangular part of A, and —U be the strictly 
upper-triangular part of A. With this notation, 


41 412 *** Gin 
421 422 +++) Aan 
A= 
ani An2 ‘t+ Ann 
is split into 
Cn 0 Oreerreeeeeeees 0 Q+.-ay2-7+++ din 
O-. an. —ay-,: a ty 
A= . : - sats = “et 
0) . a: An ln 
(eskessarcs 0 arn An) *** —Gny-1' 0 (esee Seecces 0 
=D—-—L-—U 


The equation Ax = b, or (D — L — U)x = BD, is then transformed into 
Dx = (L+ U)x +b, 
and, if D~! exists, that is, if a; 0 for each i, then 
x =D '(L+ U)x+D"'b. 
This results in the matrix form of the Jacobi iterative technique: 
x = D1 (L+U)x*)4+D"'b, k=1,2,.... (7.6) 


Introducing the notation T; = D~'(L + U) and ¢; = D~'b gives the Jacobi technique the 
form 


x® = Tx $6, (7.7) 


In practice, Eq. (7.5) is used in computation and Eq. (7.7) for theoretical purposes. 
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Example 2 Express the Jacobi iteration method for the linear system Ax = b given by 


Ey: 10x,;- x+ 2x3 = 6, 

Ey: —xy+11lx.- x34 3x4 = 25, 

E3: 2x, - X92 +10x3- x4 =—-11, 

E4: 3x2 -— = 43 + 8x4 = 15 
in the form x = Tx*") +e. 


Solution We saw in Example | that the Jacobi method for this system has the form 


1 1 3 
xy, = 1072 a Te 
1 1 3 25 
x2 = ta + 1 nt Th 
1 1 11 
3 57 ae 10? + To" 10° 
1 15 
x4 = = get Pua TS: 
Hence we have 
0 % -5 0 3 
7 9 TT WT it 
T= sa At 0 and c= ui |e | 
5 10 10 10 
0 = 4 0 8 


Algorithm 7.1 implements the Jacobi iterative technique. 


Jacobi Iterative 


To solve Ax = b given an initial approximation x: 


INPUT the number of equations and unknowns n; the entries a;;, 1 < i, j < n of the 
matrix A; the entries b;, 1 < i < n of b; the entries XO;, 1 < i < nof XO = x; tolerance 
TOL; maximum number of iterations NV. 


OUTPUT _ the approximate solution x;,...,x, or a message that the number of iterations 
was exceeded. 


Step 1 Setk=1. 
Step 2 While (k < N) do Steps 3-6. 
Step 3 Fori=1,...,n 


1 
set x; = — E ye j=1 (aijXO;) + oi}. 
ii fi 


Step 4 If ||x — XO|| < TOL then OUTPUT (x),...,%n): 


(The procedure was successful.) 
STOP. 


Step5 Setk=k+1. 
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Phillip Ludwig Seidel 
(1821-1896) worked as an 
assistant to Jacobi solving 
problems on systems of linear 
equations that resulted from 
Gauss’s work on least squares. 
These equations generally had 
off-diagonal elements that were 
much smaller than those on the 
diagonal, so the iterative methods 
were particularly effective. The 
iterative techniques now known 
as Jacobi and Gauss-Seidel were 
both known to Gauss before 
being applied in this situation, but 
Gauss’s results were not often 
widely communicated. 


Iterative Techniques in Matrix Algebra 


Step 6 Fori=1,...,nset XO; = xj. 


Step 7 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was successful.) 
STOP. = 


Step 3 of the algorithm requires that a,j; ~ 0, for each i = 1,2,...,n. If one of the aj; 
entries is 0 and the system is nonsingular, a reordering of the equations can be performed 
so that no aj = 0. To speed convergence, the equations should be arranged so that a; is as 
large as possible. This subject is discussed in more detail later in this chapter. 

Another possible stopping criterion in Step 4 is to iterate until 


k k- 
[xO — xt] 


IIx || 


is smaller than some prescribed tolerance. For this purpose, any convenient norm can be 
used, the usual being the /,. norm. 

The NumericalAnalysis subpackage of the Maple Student package implements the 
Jacobi iterative method. To illustrate this with our example we first enter both Numerical- 
Analysis and LinearAlgebra. 


with(Student|NumericalAnalysis]): with(LinearAlgebra): 


Colons are used at the end of the commands to suppress output for both packages. Enter 
the matrix with 


A := Matrix([[10, —1, 2, 0, 6], [—1, 11, —1, 3, 25], [2, —1, 10, —1, —11], [0, 3, —1, 8, 15]]) 


The following command gives a collection of output that is in agreement with the results in 
Table 7.1. 


IterativeApproximate(A, initialapprox = Vector([0., 0.,0.,0.]), tolerance = 10-3, 
maxiterations = 20, stoppingcriterion = relative(infinity), method = jacobi, 
output = approximates) 


If the option output = approximates is omitted, then only the final approximation result is 
output. Notice that the initial approximations was specified by [0.,0.,0.,0.], with decimal 
points placed after the entries. This was done so that Maple will give the results as 10-digit 
decimals. If the specification had simply been [0, 0, 0, 0], the output would have been given 
in fractional form. 


The Gauss-Seidel Method 


A possible improvement in Algorithm 7.1 can be seen by reconsidering Eq. (7.5). The 


components of x—) are used to compute all the components a of x, But, fori > 1, 
(k) 
nee 


better approximations to the actual solutions x), .. 


: ae of x“) have already been computed and are expected to be 
(k—1) 
1 


the components x 


x*—) Tt seems 


pore Xi_y 


. ,X;_1 than are x 


reasonable, then, to compute a” using these most recently calculated values. That is, to use 
1 i-l n 
k k k-1 
HP = — | — Daa) — Dixy) + Bi], (7.8) 
UL + a. 
j=l j=it+l 


for each i = 1,2,...,n, instead of Eq. (7.5). This modification is called the Gauss-Seidel 
iterative technique and is illustrated in the following example. 
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Example 3 


Table 7.2 


7.3 The Jacobi and Gauss-Siedel Iterative Techniques 455 


Use the Gauss-Seidel iterative technique to find approximate solutions to 


10x} — x+ 2x3 = 6, 
—xj) + 1lxoy- 7x34 3x4 = 25, 
2x, - x2 +10x3- x4 =—-11, 
3x. -— =—-x3 + 8x4 = 15 
starting with x = (0, 0,0, 0)! and iterating until 
(k) _ y(k-1) 
Ix x lloo < 1073. 


IX loo 


Solution The solution x = (1,2, —1, 1)’ was approximated by Jacobi’s method in Example 
1. For the Gauss-Seidel method we write the system, for each k = 1,2,...as 


x - at ae? ey = 

ot aa + = * ie _ 
x — i i 4 a 1) a 
y® _ af 4 =a = 


When x = (0,0,0,0)/, we have x = (0.6000, 2.3272, —0.9873, 0.8789). Subsequent 
iterations give the values in Table 7.2. 


k 0 1 2 3 4 5 

ae 0.0000 0.6000 1.030 1.0065 1.0009 1.0001 

— 0.0000 2.3272 2.037 2.0036 2.0003 2.0000 

ei 0.0000 —0.9873 —1.014 —1.0025 —1.0003 — 1.0000 

a 0.0000 0.8789 0.9844 0.9983 0.9999 1.0000 
Because 


Ix — xO], _ 0.0008 _ 4 ee 
xO |loo 2.000 , 


x) is accepted as a reasonable approximation to the solution. Note that Jacobi’s method in 
Example | required twice as many iterations for the same accuracy. a 


To write the Gauss-Seidel method in matrix form, multiply both sides of Eq. (7.8) by 
a; and collect all kth iterate terms, to give 


(k) (k) (k-1)) 


(k) k-1 
Qiix, + j2X4 +--+ +4jxX;° = TG lXjig 4 TT Ginx ) + bi, 


for eachi = 1,2,...,n. Writing all n equations gives 


k k-1 k-1 = 

aux} = ~ay2x5 _ ai3xs PN ainx® D+ bi, 
k k k-1 = 

an 1X) + ax} = —ay3x5 er Aayx*—) + bo, 
k k 

Ani x' ; = fax ; +eeet Ann x) = bi; 
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with the definitions of D, L, and U given previously, we have the Gauss-Seidel method 
represented by 


(D—L)x® = Ux*) +b 
and 
x® = (D-L)'Ux®) + (D—L)~'b, for eachk = 1,2,.... (7.9) 
Letting T, = (D—L)~'U ande, = (D—L)~'hb, gives the Gauss-Seidel technique the form 
x = Tx") + ey, (7.10) 


For the lower-triangular matrix D — L to be nonsingular, it is necessary and sufficient that 
ai, 4 0, for eachi = 1,2,...,n. 
Algorithm 7.2 implements the Gauss-Seidel method. 


Gauss-Seidel Iterative 
To solve Ax = b given an initial approximation x: 


INPUT the number of equations and unknowns n; the entries aj;, 1 < i, j < n of the 
matrix A; the entries b;, 1 < i < n of b; the entries XO;, 1 < i < nof XO = x; tolerance 
TOL; maximum number of iterations NV. 


OUTPUT | the approximate solution x;,...,x, or a message that the number of iterations 
was exceeded. 


Step 7 Setk=1. 
Step 2. While (k < N) do Steps 3-6. 
Step 3 Fori=1,...,n 


i-1 n 
1 
set xj = es = dears _ Ds ajjXOj; + Db; 
j=l j=it+l 
Step 4 If ||x — XO]| < TOL then OUTPUT (x1,...,x»): 


(The procedure was successful.) 
STOP. 


Step5 Setk=k+l. 
Step 6 Fori=1,...,nset XO; = x;. 


Step 7 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was successful.) 
STOP. a 


The comments following Algorithm 7.1 regarding reordering and stopping criteria also 
apply to the Gauss-Seidel Algorithm 7.2. 

The results of Examples | and 2 appear to imply that the Gauss-Seidel method is 
superior to the Jacobi method. This is almost always true, but there are linear systems for 
which the Jacobi method converges and the Gauss-Seidel method does not (see Exercises 
9 and 10). 
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The NumericalAnalysis subpackage of the Maple Student package implements the 
Gauss-Siedel method in a manner similar to that of the Jacobi iterative method. The results 
in Table 7.2 are obtained by loading both NumericalAnalysis and LinearAlgebra, the matrix 
A, and then using the command 


IterativeApproximate(A, initialapprox = Vector((0., 0., 0., 0.]), tolerance = 10-3, maxiterations 
= 20, stoppingcriterion = relative(infinity), method = gaussseidel, output = approximates) 


If we change the final option to output = [approximates, distances], the output also 
includes the /,. distances between the approximations and the actual solution. 


General Iteration Methods 
To study the convergence of general iteration techniques, we need to analyze the formula 
x9 = Tx* D4 c, foreachk = 1,2,..., 


where x) is arbitrary. The next lemma and Theorem 7.17 on page 449 provide the key for 
this study. 


Lemma 7.18 — If the spectral radius satisfies p(T) < 1, then (J — T)~! exists, and 


oe) 
(-T)'=14+T+P +. = yo7. a 
j=0 


Proof Because Tx = Ax is true precisely when (J — T)x = (1 — A)x, we have A as an 
eigenvalue of T precisely when | — A is an eigenvalue of J — T. But |A| < p(T) < 1, so 
i = 1 is not an eigenvalue of T, and 0 cannot be an eigenvalue of J — T. Hence, (J — T)~! 
exists. 

Let S, =J+T7+T7? +---+T7". Then 


C=TsS, SUS TEE teh a C4 Pee er ay T™, 
and, since T is convergent, Theorem 7.17 implies that 


lim 7 —T)S» = lim 7 —T"*!) = 1. 
m—-> Oo 


m—> oo 
Thus, 7 — T)~! = limpooo Sm =1+TH+ 0? +---= Dol. — 
Theorem 7.19 For any x e R", the sequence {x} ) defined by 
x® —=Tx*) +e, foreachk > 1, (7.11) 
converges to the unique solution of x = Tx + c if and only if p(T) < 1. a 
Proof First assume that p(T) < 1. Then, 
x = Tx® D+ ¢ 


= T(Tx* +e) +¢ 
=T’x*% 4 (T+De 


= Thx 4 (Te! +..-4+74 De. 
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Corollary 7.20 


Iterative Techniques in Matrix Algebra 


Because p(T) < 1, Theorem 7.17 implies that T is convergent, and 


lim Tkx© = 0. 
k->oo 


Lemma 7.18 implies that 


CO 
lim x = lim T*x® + y T)c=04+(U-T)'e=(U-T) ec. 
k->0o k-> 00 =F 

J= 


Hence, the sequence {x} converges to the vector x = (I — T)~'!e andx = Tx +c. 

To prove the converse, we will show that for any z € R”, we have limy_, 4, Tz = 0. 
By Theorem 7.17, this is equivalent to p(T) < 1. 

Let z be an arbitrary vector, and x be the unique solution to x = Tx + c. Define 
xO — x —z, and, fork > 1,x = Tx") + ©. Then {x} converges to x. Also, 


x —x = (Tx+ce) — (Tx®") +e) =T(x—x*), 
Ne) 
x -xO=T (x 7 x@D) -T7? (x _ x%2)) ee, 1 (x _ x) — Tz. 


Hence limy.o0 T*z = limg_.o0 T* (x — xX) = limy 00 (x — x) = 0. 
But z € R” was arbitrary, so by Theorem 7.17, T is convergent and p(T)<1.=0 = uo 
The proof of the following corollary is similar to the proofs in Corollary 2.5 on page 62. 


It is considered in Exercise 13. 


If ||7 || < 1 for any natural matrix norm and c is a given vector, then the sequence {x}, 
defined by x = Tx) + © converges, for any x € R", to a vector x € R”, with 
x = Tx +, and the following error bounds hold: 


‘ k ky y(0 : . : k 
@ x-xO] SITIO — x, Dx — x] < pO - xO]. 
We have seen that the Jacobi and Gauss-Seidel iterative techniques can be written 
x = Tx*-)) +c and x9 = Tx) + Cg, 
using the matrices 


T)=D"'(L+U) and T,=(D-—L)'U. 


If o(7;) or p(T;) is less than 1, then the corresponding sequence (xP 1e), will converge to 
the solution x of Ax = b. For example, the Jacobi scheme has 


x = D'(L+ U)x*") + Db, 
and, if {x} ) converges to x, then 
x =D '(L+U)x+D"'b. 
This implies that 
Dx =(L+U)x+b and (D—L-—U)x=b. 


Since D — L — U =A, the solution x satisfies Ax = b. 

We can now give easily verified sufficiency conditions for convergence of the Jacobi 
and Gauss-Seidel methods. (To prove convergence for the Jacobi scheme see Exercise 14, 
and for the Gauss-Seidel scheme see [Or2], p. 120.) 
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Theorem 7.21 


Theorem 7.22 


7.3 The Jacobi and Gauss-Siedel Iterative Techniques 459 


If A is strictly diagonally dominant, then for any choice of x, both the Jacobi and 
Gauss-Seidel methods give sequences Ne, that converge to the unique solution of 
Ax = b. a 


The relationship of the rapidity of convergence to the spectral radius of the iteration 
matrix T can be seen from Corollary 7.20. The inequalities hold for any natural matrix 
norm, so it follows from the statement after Theorem 7.15 on page 446 that 


Ix© — x] © p(T)‘ ||x© — xl]. (7.12) 


Thus we would like to select the iterative technique with minimal p(T) < | fora particular 
system Ax = b. No general results exist to tell which of the two techniques, Jacobi or Gauss- 
Seidel, will be most successful for an arbitrary linear system. In special cases, however, the 
answer is known, as is demonstrated in the following theorem. The proof of this result can 
be found in [Y], pp. 120-127. 


(Stein-Rosenberg) 


If a;; < 0, for each i A j and aj; > 0, for each i = 1,2,...,n, then one and only one of the 
following statements holds: 
(i) 0<p(T,) < p(T) <1: Gi) 1 < p(T) < p(T): 
(iii) (Tj) = p(T) = 9; (iv) (Tj) =e(T7,) = 1. o 


For the special case described in Theorem 7.22, we see from part (i) that when one 
method gives convergence, then both give convergence, and the Gauss-Seidel method con- 
verges faster than the Jacobi method. Part (11) indicates that when one method diverges then 
both diverge, and the divergence is more pronounced for the Gauss-Seidel method. 


EXERCISE SET 7.3 


1. 


2. 


Find the first two iterations of the Jacobi method for the following linear systems, using x = 0: 


a 3x,- »4+ 4 =1, b. 10x,- xX = 9, 
3x; + 6x. + 2x3 = 0, —x, + 10x, — 2x3 = 7, 
3x, + 3x. + 7x3 = 4. — 2x.+10x3 = 6. 
ce. 10x, + 5x2 = 6, d 4x, + xo t+ x34 X5 = 6, 
5x, + 10x. — 4x3 =25; —x, —3x.+ x3t+ X% = 6, 


= 4xy + 8x3 — x= -l1l, 
= x34+5x, = -11. 


2x, + xX. +5x3- x-— x5 = 6, 
—X~- — 43 +44 = 6, 
2X, — x3+ x4 + 4x5 = 6. 
Find the first two iterations of the Jacobi method for the following linear systems, using x = 0: 


a 44, 4+ »-— %=5, b.  —2x,;4+ x + 5X3 = 4, 


—xi + 3x2 + 24> —4, X1—2Xx _ 5X3 = —4, 
2x + 2x. + 5x3 = 1. X2 + 2x3 = 0. 
ce 444+ %-— w+ x = 2, dad. 4x, -— x — % = 0, 
xX +4 - 3- xy =, x1 + 4x2 — 3 = =5, 
—x,- 1 +5y3+ 1% =0, — 1+ 43 — x =O0, 
X— Xot+ x3 + 3x4 = 1. TAL + 4x4 — x5 =6, 
— X2 — X4 + 4x5 — X= =—2, 

—= XB — X5 + 4X6 =6 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


460 CHAPTER 7 « 
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10. 


11. 


Iterative Techniques in Matrix Algebra 


Repeat Exercise | using the Gauss-Seidel method. 
Repeat Exercise 2 using the Gauss-Seidel method. 
Use the Jacobi method to solve the linear systems in Exercise 1, with TOL = 107? in the /,, norm. 
Use the Jacobi method to solve the linear systems in Exercise 2, with TOL = 10? in the J.) norm. 


Use the Gauss-Seidel method to solve the linear systems in Exercise 1, with TOL = 10-3 in the J, 
norm. 
Use the Gauss-Seidel method to solve the linear systems in Exercise 2, with TOL = 10-3 in the J, 
norm. 
The linear system 

2x1 - w+ x3 =, 

2x, + 2x2 + 2x3 = 4, 

—xX,} — X +2x3= —-5 


has the solution (1, 2,—1)’. 

a. Show that p(7;) = “8 > 1. 

b. Show that the Jacobi method with x© = 0 fails to give a good approximation after 25 iterations. 

c. Show that p(T,) = 5. 

d. Use the Gauss-Seidel method with x© = 0 to approximate the solution to the linear system to 
within 10~> in the /,, norm. 


The linear system 
xX] + 2x9 = 2x3 = ds 
M+ mt 4 =2, 
2x + 2x + x3 5 


has the solution (1,2, —1)’. 

a. Show that o(7;) = 0. 

b. Use the Jacobi method with x© = 0 to approximate the solution to the linear system to within 
10-5 in the /,, norm. 


c. Show that p(T,) = 2. 
d. Show that the Gauss-Seidel method applied as in part (b) fails to give a good approximation in 
25 iterations. 


The linear system 


x _ X3 0.2, 
1 
75*1 + = 43 = —1.425, 
X-—- =X. + B= 2 
has the solution (0.9, —0.8, 0.7)’. 
a. Is the coefficient matrix 
1 0 -!l 
A=| -} 1 -i 
1 
1 -5 1 


strictly diagonally dominant? 
b. Compute the spectral radius of the Gauss-Seidel matrix T,. 


c. Use the Gauss-Seidel iterative method to approximate the solution to the linear system with a 
tolerance of 10-7 and a maximum of 300 iterations. 


d. What happens in part (c) when the system is changed to 
xX] — 2x3 = 0.2, 
: + : 1.425 
—=x XxX, — —x3 = —1.425, 
7" 2 7% 


aa x3 = 2. 
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12. 
13. 


14. 
15. 


16. 


17. 
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Repeat Exercise 11 using the Jacobi method. 


a. Prove that 


k 
k k 0 k I7'|| 1 0 
Ix — xl] < ITI IIx® —xi] and x® —x\| < T-|TI ee" |, 


where T is ann X n matrix with ||T|| < 1 and 


x” =Tx* Vi e¢, k= | ee 


with x arbitrary, c € R”, andx = Tx +c. 
b. Apply the bounds to Exercise 1, when possible, using the /,. norm. 
Show that if A is strictly diagonally dominant, then ||T;||.. < 1. 


Use (a) the Jacobi and (b) the Gauss-Seidel methods to solve the linear system Ax = b to within 10~> 
in the /,, norm, where the entries of A are 


21, when j =i and i= 1,2,...,80, 


j=it2andi=1,2,...,78, 


0.57, when 
j=i-2andi=3,4,...,80, 


aij = 
j=it4andi=1,2,...,76, 


0.251, when 
j=i-4andi=5,6,...,80, 


0, otherwise, 


and those of b are b; = x, for eachi = 1,2,..., 80. 


Suppose that an object can be at any one of n+ | equally spaced points x9, x),...,x,. When an object 
is at location x;, it is equally likely to move to either x;_; or x;,; and cannot directly move to any 
other location. Consider the probabilities {P;}_) that an object starting at location x; will reach the 
left endpoint x) before reaching the right endpoint x,. Clearly, Pp) = 1 and P,, = 0. Since the object 
can move to x; only from x;-; or x;; and does so with probability 5 for each of these locations, 


1 


1 
P= gh + 5Pitt» for eachi = 1,2,...,n—1. 


a. Show that 


1 —i Orerereeee eee 0 
I 1 
—z 1-2. P, 1 


b. Solve this system using n = 10, 50, and 100. 


c. Change the probabilities to ~@ and 1 — w for movement to the left and right, respectively, and 
derive the linear system similar to the one in part (a). 


d. Repeat part (b) witha = ‘ 
Suppose that A is a positive definite. 


a. Show that we can write A = D — L — L’, where D is diagonal with d; > 0 foreach 1 <i<n 
and L is lower triangular. Further, show that D — L is nonsingular. 


b. Let T, = (D—L)"'L' and P = A — T;AT,. Show that P is symmetric. 
c. Show that T, can also be written as T, = J — (D — L)“'A. 
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d. Let Q = (D—L)"!A. Show that T, = I — Q and P = Q'[AQ"! —A+ (Q')!A]Q. 
e. Show that P = Q'DQ and P is positive definite. 


Let A be an eigenvalue of T, with eigenvector x 4 0. Use part (b) to show that x'Px > 0 implies 
that |A] < 1. 


g. Show that 7, is convergent and prove that the Gauss-Seidel method converges. 


18. The forces on the bridge truss described in the opening to this chapter satisfy the equations in the 
following table: 


Joint Horizontal Component Vertical Component 
® -Fi+?fi+f=0 Phi, =0 
@ -8fA+8h=0 -PA-fA-7h=0 
6) —frot+ fs=0 Js — 10,000 = 0 
® —4 f, — fs=0 1, — F; =0 


F-17100 *# 1 0 0 | 
i =f oo: 2 6 0 0 | Fi ] [ 0 ] 
: F, 0 
0 o -1 7 0 0 | F, | | oo | 
0 oO 0 = 0 1 5 0 fi [_ 0 . 
0 0 0 O -1 0 0 1 fr 0 | 
0 0 0 OO 0 1 0 0 fs mee 
5 B fa 0 
0 0 0 -# G 2 - | 0 
Lo 0 0 0 0 0 -#8 -1 


a. Explain why the system of equations was reordered. 


b. Approximate the solution of the resulting linear system to within 10~? in the /,, norm using 
as initial approximation the vector all of whose entries are 1s with (i) the Jacobi method and 
(ii) the Gauss-Seidel method. 


| a 7.4 Relaxation Techniques for Solving Linear Systems 


We saw in Section 7.3 that the rate of convergence of an iterative technique depends on the 
spectral radius of the matrix associated with the method. One way to select a procedure to 
accelerate convergence is to choose a method whose associated matrix has minimal spectral 
radius. Before describing a procedure for selecting such a method, we need to introduce a 
new means of measuring the amount by which an approximation to the solution to a linear 
system differs from the true solution to the system. The method makes use of the vector 
described in the following definition. 


Definition 7.23 Suppose x € R” is an approximation to the solution of the linear system defined by Ax = b. 
The residual vector for x with respect to this system is r = b — Ax. a 


The word residual means what is In procedures such as the Jacobi or Gauss-Seidel methods, a residual vector is associated 

left over, which is an appropriate with each calculation of an approximate component to the solution vector. The true objective 

name for this vector, is to generate a sequence of approximations that will cause the residual vectors to converge 
rapidly to zero. Suppose we let 


(kK) _ 7k) kK) (k)\t 
Ve = (ip Tai se a) 
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denote the residual vector for the Gauss-Seidel method corresponding to the approximate 
solution vector x? defined by 


x = Care (k) xe) xD), 


perro Aj pte 


The mth component of r®) is 


i-1 n 
k k k-1 
18) = b= Dna? — Daal, 3) 
j=l j=i 
or, equivalently, 
i-1 n 
k) (k-1 k-1 
7O = = Dy ~ Deans? - 2 AmjX; ) Ge ) 
j=it+l 


for each m = 1,2,...,n. 
In particular, the ith component of r® is 


i-1 n 
k k k= k-1 
ry = bi — ax? - as, D = AjjX - ) 
j=l j=itl 
so 
i-1 n 
k- k k) k-1 
ajix\ Me che = bj — >> aijx; _ x aijX, ) (7.14) 


j=itl 


Recall, however, that in the Gauss-Seidel method, a? is chosen to be 


1 
Pi =—]b,- Yo aijxy? — > aiix y ‘ (7.15) 


j= j=it+1 


so Eq. (7.14) can be rewritten as 


k- (k k 
axe ee 7 = = aixt”. : 


Consequently, the Gauss-Seidel method can be characterized as choosing a” to satisfy 


7 
ge gM ae (7.16) 


Qii 


We can derive another connection between the residual vectors and the Gauss- 


Seidel technique. Consider the residual vector ne associated with a vector <7 = 
a, oo Tae ae ..., xD)", By Eq. (7.13) the ith component of ra is 


n 
PAU) (k) (k-1) 
Cie Yas > Gj jX; 


j=itl 


i-1 


n 
(k) (k-1) (k) 
= bj — > a,x} = > Aj jx; — Aix; . 
j=l 


j=itl 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


464 


CHAPTER 7 « 


Example 1 


Iterative Techniques in Matrix Algebra 


By the manner in which af ) is defined in Eq. (7.15) we see that i= = 0. Ina sense, then, 


the Gauss-Seidel technique is characterized by choosing each x; a in such a way 
that the ith polapenens of r“ bi : is zero. 

Choosing ra " so that one coordinate of the residual vector is ae however, is not 
necessarily the most efficient way to reduce the norm of the vector r If we modify the 
Gauss-Seidel procedure, as given by Eq. (7.16), to 


we 


fag Gg (7.17) 


then for certain choices of positive @ we can reduce the norm of the residual vector and 
obtain significantly faster convergence. 

Methods involving Eq. (7.17) are called relaxation methods. For choices of w with 
0 < w < 1, the procedures are called under-relaxation methods. We will be interested 
in choices of w with 1 < w, and these are called over-relaxation methods. They are 
used to accelerate the convergence for systems that are convergent by the Gauss-Seidel 
technique. The methods are abbreviated SOR, for Successive Over-Relaxation, and are 
particularly useful for solving the linear systems that occur in the numerical solution of 
certain partial-differential equations. 

Before illustrating the advantages of the SOR method, we note that by using Eq. (7.14), 
we can reformulate Eq. (7.17) for calculation purposes as 


i-1 n 
(k) k-1) | @ (k) (k=1) 
x; = (1 —- o)x; se bi D ais, _ Se ax} 
To determine the matrix form of the SOR method, we rewrite this as 
i-1 n 
ayx® +@ 2 ajjx,° = d = w)ayx"—? —-@O 2 aa + wb;, 
j=l j=it+l 
so that in vector form, we have 
(D — wL)x™ = [0 — w)D + wx + ob. 
That is, 
(k) __ -1 (k—-1) -1 
x’ = (D- aL) [1 —@)D+ aU ]x +w(D— aL) b. (7.18) 


Letting T,, = (D — wL)~'[(1 — w) D + @U] and ¢, = w(D — wL)~'b, gives the SOR 
technique the form 


x® = Tx®) 4 ¢,. (7.19) 


The linear system Ax = b given by 
Ax, + 3x2 = 24, 
3x, + 4x. — x3 = 30, 
— x» + 4x3 = —24, 


has the solution (3, 4, —5)’. Compare the iterations from the Gauss-Seidel method and the 
SOR method with w = 1.25 using x = (1,1, 1)‘ for both methods. 
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Solution For eachk = 1,2,..., the equations for the Gauss-Seidel method are 
x = —0,75x8—? +6, 
a = —0,75x\" + 0.25") +75, 
x = 0.25x9 — 6, 
and the equations for the SOR method with w = 1.25 are 
x) = —0.25x4-” — 0.9375x5" + 7.5, 
x = —0.9375x" — 0.2524? + 0,3125x8-? + 9.375, 
aS 0.3125x," = 025"? = 75. 


The first seven iterates for each method are listed in Tables 7.3 and 7.4. For the iterates 
to be accurate to seven decimal places, the Gauss-Seidel method requires 34 iterations, as 


opposed to 14 iterations for the SOR method with w = 1.25. | 
Table 7.3 
k 0 1 2 3 4 5 6 7 
a 1 5.250000 3.1406250 3.0878906 3.05493 16 3.0343323 3.0214577 3.0134110 
a 1 3.812500 3.8828125 3.9267578 3.9542236 3.9713898 3.9821186 3.9888241 
Poi 1 —5.046875 —5.0292969 —5.0183105 —5.0114441 —5.0071526 —5.0044703 —5.0027940 
Table 7.4 
k 0 1 2 3 4 5 6 7 
af 1 6.312500 2.6223145 3.1333027 2.9570512 3.0037211 2.9963276 3.0000498 
a 1 3.5195313 3.9585266 4.0102646 4.0074838 4.0029250 4.0009262 4.0002586 
— 1 —6.6501465 —4.6004238 —5.0966863 —4.9734897 —5.0057135 —4.9982822 —5.0003486 


An obvious question to ask is how the appropriate value of w is chosen when the SOR 
method is used. Although no complete answer to this question is known for the general 
n x n linear system, the following results can be used in certain important situations. 


Theorem 7.24 (Kahan) 


Ifa; ~€ 0, foreachi = 1,2,...,n, then p(T) > |w— 1]. This implies that the SOR method 
can converge only if 0 < w < 2. a 


The proof of this theorem is considered in Exercise 9. The proof of the next two results 
can be found in [Or2], pp. 123-133. These results will be used in Chapter 12. 


Theorem 7.25  ( Ostrowski-Reich) 


If A is a positive definite matrix and 0 < w < 2, then the SOR method converges for any 
choice of initial approximate vector x. o 
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If A is positive definite and tridiagonal, then p(T,) = [ p(T)? < 1, and the optimal choice 
of w for the SOR method is 
2 


14+ /1—-[p@pr 


With this choice of w, we have p(T,,) = w — 1. a 


oz 


Find the optimal choice of w for the SOR method for the matrix 


4 3 
A=]| 3 4 
0 -1 


-1 


Solution This matrix is clearly tridiagonal, so we can apply the result in Theorem 7.26 if we 
can also who that it is positive definite. Because the matrix is symmetric, Theorem 6.24 on 
page 416 states that it is positive definite if and only if all its leading principle submatrices 
has a positive determinant. This is easily seen to be the case because 


det(A) = 24, act (| 3 a and det ([4]) = 4. 
Because 
7 oO 0 -3 0 0 075 0 
TJ =D 'L+U)=|] 0 4 0 —3 O1f=!] -0.75 O 0.25 |, 
1 0 10 0 0.25 0 
0 0 ; 
we have 
= =975 6 
T;-*I=| -0.75 -A 0.25 |, 
0 025. =) 
SO 
det(T; — AN) = —2.(07 — 0.625). 
Thus 
p(T;) = V0.625 
and 


2 
1+/1-0625 


1.24. 


2 
14 /1- Pr 


This explains the rapid convergence obtained in Example | when using w = 1.25. a 


We close this section with Algorithm 7.3 for the SOR method. 
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SOR 
To solve Ax = b given the parameter w and an initial approximation x: 


INPUT the number of equations and unknowns n; the entries a;;, 1 < i,j <n, of the 
matrix A; the entries b;, 1 < i < n, of b; the entries XO;, 1 < i < n, of XO = x; the 
parameter w; tolerance TOL; maximum number of iterations NV. 


OUTPUT _ the approximate solution x), ..., x, or a message that the number of iterations 
was exceeded. 


Step 7 Setk=1. 
Step 2. While (k < N) do Steps 3-6. 
Step 3 Fori=1,...,n 


1 i- n 
set x; = (1 — w)XO; + rn [~ (- ys Aj jXj — baeare ajjXOj; + bi). 
Step 4 If ||x — XO|| < TOL then OUTPUT (1,..., xn); 

(The procedure was successful.) 

STOP. 
Step5 Setk=k+1. 


Step 6 Fori=1,...,nset XO; = xj. 


Step 7 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was successful.) 
STOP. a 


The NumericalAnalysis subpackage of the Maple Student package implements the SOR 
method in a manner similar to that of the Jacobi and Gauss-Seidel methods. The SOR results 
in Table 7.4 are obtained by loading both NumericalAnalysis and LinearAlgebra, the matrix 
A, the vector b = [24, 30, —24]’, and then using the command 


IterativeApproximate(A, b, initialapprox = Vector([1., 1., 1., 1.]), tolerance = 1073, 
maxiterations = 20, stoppingcriterion = relative(infinity), method = SOR(1.25), 
output = approximates) 


The input method = SOR(1.25) indicates that the SOR method should use the value w = 
1.25. 


EXERCISE SET 7.4 


1. 


Find the first two iterations of the SOR method with w = 1.1 for the following linear systems, using 
xO — 0: 


a 3x, - m+ w=], b. 10x,- x = 9, 
3x, + 6x. + 2x3 = 0, —x, + 10x). — 2x3 =7, 
3x, + 3x. + 7x3 = 4. — 2x +10x3 = 6. 
ce. 10x, + 5x = 6, gd 44,4 H+ 2+ X5 = 6, 
5x1 + 10x. — 4x3 =25: —xX, —3x%. + x+ xy = 6, 
— 454+ 8x3- xm =I, 2x, + xX. +5x3- x4-— x5 = 6, 
— +23+5x%4=—-11. —Xj}-— HM — x%3+4y = 6, 


2x. —- x3 4+ xX + 4x5 = 6. 
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2. Find the first two iterations of the SOR method with w = 1.1 for the following linear systems, using 


xO = 0: 
a 44,4+ »- »%=5, b.  —2x,;4+ 2+ 5X3 = 4, 
—xX,1 + 3%. + 13 = —4, xj —-2x) — 3X3 = -4, 
2x, + 2x2. + 5x3 = 1. X2 + 2x3 = 0. 
Cc 4x, + %2-— 23+ x4 = —-2, d. 4x; — XxX = 0, 
x, +4- 14- xm =, —x, +4 -— x3 =5, 
—xX,— %+5x3+ x4 = 0, — »+ 4x3 =0, 
Xp— Xt 2434+3x4 = 1. + 4x4 —-— X5 = 6, 
— x+4x5- xX = 2, 
— x + 4% = 6. 
3. Repeat Exercise 1 using w = 1.3. 
Repeat Exercise 2 using w = 1.3. 
5. Use the SOR method with w = 1.2 to solve the linear systems in Exercise 1 with a tolerance 
TOL = 107? in the /,, norm. 
6. Use the SOR method with w = 1.2 to solve the linear systems in Exercise 2 with a tolerance 


TOL = 107? in the /,, norm. 

7. Determine which matrices in Exercise 1 are tridiagonal and positive definite. Repeat Exercise 1 for 
these matrices using the optimal choice of w. 

8. Determine which matrices in Exercise 2 are tridiagonal and positive definite. Repeat Exercise 2 for 
these matrices using the optimal choice of w. 


9. Prove Kahan’s Theorem 7.24. [Hint: If 41,...,A, are eigenvalues of T,,, then det T,, = Wen Ki. 
Since det D~! = det(D — wL)~! and the determinant of a product of matrices is the product of the 
determinants of the factors, the result follows from Eq. (7.18).] 

10. The forces on the bridge truss described in the opening to this chapter satisfy the equations in the 
following table: 


Joint Horizontal Component Vertical Component 
® -F+“f+fh=0 2 f, — Fy =0 
@ -8f4+8f=0 -Bf-f-zf=0 
@ -h+fs=0 fs — 10,000 = 0 
® —3 f,— fs =0 i fy — F3 =0 


-1 0 0 #% 1 060 0 0 
o -1 0 2 0 0 o 0 Fi | [ 0 | 
Fy 0 
0 0 -1 o 0 0 4 0 | as 0 | 
0 oO 0 20 -!l 5 0 fi E 0 
0 0 0 oO -1 0 0 1 | fr 0 | 
0 0 0 0 0 1t 0 0 fa e000 
B A fa 0 | 
0 0 0 -# 0 2 0 fs Lo 
Lo 0 0 0 6 0 =-# -1 


a. Explain why the system of equations was reordered. 


b. Approximate the solution of the resulting linear system to within 10~? in the /,, norm using as 
initial approximation the vector all of whose entries are 1s and the SOR method with w = 1.25. 
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11. Use the SOR method to solve the linear system Ax = b to within 10~> in the /,, norm, where the 
entries of A are 


2i, when j =i and i= 1,2,...,80, 


j=i 2 d= 1,2.3454.18; 
0.5i, when peek eee 

j=i-2andi=3,4,...,80, 

qQi,j = 

j=i 4 di= 1,2....;; 76, 
(Se teat 

j=i-4andi=5,6,...,80, 
0, otherwise, 


and those of b are b; = x, for eachi = 1,2,...,80. 


12. InExercise 17 of Section 7.3 a technique was outlined to prove that the Gauss-Seidel method converges 
when A is a positive definite matrix. Extend this method of proof to show that in this case there is also 
convergence for the SOR method with 0 < w < 2. 


| 7.5 Error Bounds and Iterative Refinement 


It seems intuitively reasonable that if x is an approximation to the solution x of Ax = b and 
the residual vector r = b — Ax has the property that ||r|| is small, then ||x — x|| would be 
small as well. This is often the case, but certain systems, which occur frequently in practice, 
fail to have this property. 


Example 1 The linear system Ax = b given by 


1 2 XxX] _ 3 
1.0001 2 x. | | 3.0001 


has the unique solution x = (1, 1)’. Determine the residual vector for the poor approximation 
x = (3, —0.0001)’. 


Solution We have 


ee ie 2 3 _ [ 0.0002 
. *= 1 3.0001 1.0001 2 || —0.0001 | ~| 0 ; 


SO |[rlloo = 0.0002. Although the norm of the residual vector is small, the approximation 
x = (3, —0.0001)’ is obviously quite poor; in fact, ||k — X|loo = 2. = 


The difficulty in Example | is explained quite simply by noting that the solution to the 
system represents the intersection of the lines 


ls xy t2x%9=3 and bh: 1.0001x,; + 2x2 = 3.0001. 
The point (3, —0.0001) lies on /5, and the lines are nearly parallel. This implies that 


(3, —0.0001) also lies close to /;, even though it differs significantly from the solution of 
the system, given by the intersection point (1, 1). (See Figure 7.7.) 
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Theorem 7.27 


Definition 7.28 
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Example | was clearly constructed to show the difficulties that can—and, in fact, do— 
arise. Had the lines not been nearly coincident, we would expect a small residual vector to 
imply an accurate approximation. 

In the general situation, we cannot rely on the geometry of the system to give an 
indication of when problems might occur. We can, however, obtain this information by 
considering the norms of the matrix A and its inverse. 


Suppose that x is an approximation to the solution of Ax = b, A is a nonsingular matrix, 
and r is the residual vector for x. Then for any natural norm, 


IIx — XI] < [lr 1A" 
and if x 4 0 andb 40, 


a ads yay oe (7.20) 
|x| |||] 


Proof Since r = b—AX = Ax— AX and A is nonsingular, we have x —X = A7'r. Theorem 
7.11 on page 440 implies that 


~ -1 -1 
|x — x|| = ||A~ rl] < AI Ir). 


Moreover, since b = Ax, we have ||b|| < ||Al] - ||x||. So 1/||x|| < ||A]|/||b|| and 


— Xx . -1 
IIx — x|| Z |All: IAW Tl 
IIx| ||| 


IIr |. ae 


Condition Numbers 


The inequalities in Theorem 7.27 imply that ||A~!|| and ||Al| - ||A7'|| provide an indication 
of the connection between the residual vector and the accuracy of the approximation. In 
general, the relative error ||x —x|| /||x|| is of most interest, and, by Inequality (7.20), this error 
is bounded by the product of ||A|| - ||A~!|| with the relative residual for this approximation, 
||r||/||b||. Any convenient norm can be used for this approximation; the only requirement 
is that it be used consistently throughout. 


The condition number of the nonsingular matrix A relative to a norm || - || is 


K(A) = |IAl - |A“'I. nm 
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With this notation, the inequalities in Theorem 7.27 become 


IIr | 
|x — x|| < K(A)___- 
IIA 
and 
I|x — xl Ilr | 
—— < K(A)—. 
IIx! || || 
For any nonsingular matrix A and natural norm || - ||, 


1= (| =|A-A tI < JA - AT = K@). 


A matrix A is well-conditioned if K (A) is close to 1, and is ill-conditioned when K (A) is 
significantly greater than 1. Conditioning in this context refers to the relative security that 
a small residual vector implies a correspondingly accurate approximate solution. 


Example 2 Determine the condition number for the matrix 


1 2 
a=| 1.0001 ) 


Solution We saw in Example | that the very poor approximation (3, —0.0001)‘ to the exact 
solution (1, 1)‘ had a residual vector with small norm, so we should expect the condition 
number of A to be large. We have ||A||o. = max{|1| + |2],|1.001] + |2|} = 3.0001, which 
would not be considered large. However, 


ats —10000 10000 
~ 5000.5 —5000 |’ 


and for the infinity norm, K(A) = (20000)(3.0001) = 60002. The size of the condition 
number for this example should certainly keep us from making hasty accuracy decisions 
based on the residual of an approximation. a 


|A~' loo = 20000, 


The condition number K,, can be computed in Maple by first loading the LinearAlge- 
bra package and the matrix. Then the command ConditionNumber(A) gives the condition 
number in the /,. norm. For example, we can obtain the condition number of the matrix A 
in Example 2 with 


A := Matrix({[1, 2], [1.0001, 2]]): ConditionNumber(A) 
60002.00000 


Although the condition number of a matrix depends totally on the norms of the matrix 
and its inverse, the calculation of the inverse is subject to roundoff error and is dependent on 
the accuracy with which the calculations are performed. If the operations involve arithmetic 
with ¢ digits of accuracy, the approximate condition number for the matrix A is the norm 
of the matrix times the norm of the approximation to the inverse of A, which is obtained 
using ¢-digit arithmetic. In fact, this condition number also depends on the method used 
to calculate the inverse of A. In addition, because of the number of calculations needed to 
compute the inverse, we need to be able to estimate the condition number without directly 
determining the inverse. 

If we assume that the approximate solution to the linear system Ax = b is being 
determined using f-digit arithmetic and Gaussian elimination, it can be shown (see [FM], 
pp. 45-47) that the residual vector r for the approximation x has 


[Irl] © LO“AI| + [XI (7.21) 
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From this approximation, an estimate for the effective condition number in f-digit 
arithmetic can be obtained without the need to invert the matrix A. In actuality, this approxi- 
mation assumes that all the arithmetic operations in the Gaussian elimination technique are 
performed using t-digit arithmetic but that the operations needed to determine the residual 
are done in double-precision (that is, 2t-digit) arithmetic. This technique does not add sig- 
nificantly to the computational effort and eliminates much of the loss of accuracy involved 
with the subtraction of the nearly equal numbers that occur in the calculation of the residual. 

The approximation for the t-digit condition number K(A) comes from consideration 
of the linear system 


Ay =r. 


The solution to this system can be readily approximated because the multipliers for the 
Gaussian elimination method have already been calculated. So A can be factored in the 
form P‘LU as described in Section 5 of Chapter 6. In fact y, the approximate solution of 
Ay =r, satisfies 


y~Alr=A'(b—Ax) =A 'b—-A'AK=x—k; (7.22) 
and 
X2X+yY. 


So jy is an estimate of the error produced when x approximates the solution x to the original 
system. Equations (7.21) and (7.22) imply that 


[9 ll © lx — Xl] = AT'r ll < ATT - Ul © AW" | (LO IAT - HI) = 10“ RIK A). 


This gives an approximation for the condition number involved with solving the system 
Ax = b using Gaussian elimination and the f-digit type of arithmetic just described: 


(A) = hee, (7.23) 
IIx! 
The linear system given by 
3.3330 15920  —10.333 x} 15913 
2.2220 16.710 9.6120 xX. | =| 28.544 
1.5611 5.1791 1.6852 x3 8.4254 


has the exact solution x = (1, 1,1)’. 


Using Gaussian elimination and five-digit rounding arithmetic leads successively to the 
augmented matrices 


3.3330 15920 —10.333 15913 
0 —10596 16.501 10580 
0 —7451.4 6.5250 —7444.9 
and 
3.3330 15920 —10.333 15913 
0 —10596 16.501 —10580 
0 0 —5.0790 —4,.7000 


The approximate solution to this system is 


X = (1.2001, 0.99991, 0.92538)’. 
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The residual vector corresponding to x is computed in double precision to be 


r=b— Ax 
15913 3.3330 15920 —10.333 1.2001 
=| 28.544 | —] 2.2220 16.710 9.6120 0.99991 
8.4254 1.5611 5.1791 1.6852 0.92538 
15913 15913.00518 —0.00518 
=] 28.544 | —] 28.26987086 | = 0.27412914 : 
8.4254 8.611560367 —0.186160367 


so 


IIrlloo = 0.27413. 


473 


The estimate for the condition number given in the preceding discussion is obtained by 


first solving the system Ay = r for y: 


3.3330 15920 —10.333 YI —0.00518 
2.2220 16.710 9.6120 y |= 0.27413 
1.5611 5.1791 1.6852 Y3 —0.18616 


This implies that ¥ = (—0.20008,8.9987 x 10~>,0.074607)'. Using the estimate in 


Eq. (7.23) gives 


wy l¥lloo 95 _ 0:20008 


A = 10° = 16672. 
IX lloo 1.2001 


K(A) 


(7.24) 


To determine the exact condition number of A, we first must find A~!. Using five-digit 


rounding arithmetic for the calculations gives the approximation: 


—1.1701 x 10-*+  —1.4983 x 107! 8.5416 x 107! 
Ate 6.2782 x 10-> 1.2124 x 10-+  —3.0662 x 10-* 
—8.6631 x 10-° 1.3846 x 107' + —1.9689 x 107! 


Theorem 7.11 on page 440 implies that ||A~'||,, = 1.0041 and ||A]|,, = 15934. 
As a consequence, the ill-conditioned matrix A has 


K(A) = (1.0041) (15934) = 15999. 


The estimate in (7.24) is quite close to K(A) and requires considerably less computa- 


tional effort. 


Since the actual solution x = (1, 1, 1)’ is known for this system, we can calculate both 


: ~% 0.2001 
ix Zee 02001 and [X= *lo = 0.2001. 
IXIloo 1 


The error bounds given in Theorem 7.27 for these values are 


2 tlc (15999) (0.27413) 
= KA = = 0.2752 
IX — Xlloo < K(A) Alles 15934 0.27525 
and 
~x 1 2741 
[x= Xlloo (yoo _ (15999) (0.27413) _ 9 nase), 
IX] oo Dlloo 15913 
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Iterative Refinement 


In Eq. (7.22), we used the estimate y + x — x, where y is the approximate solution to the 
system Ay = r. In general, x-++V is a more accurate approximation to the solution of the linear 
system Ax = b than the original approximation x. The method using this assumption is 
called iterative refinement, or iterative improvement, and consists of performing iterations 
on the system whose right-hand side is the residual vector for successive approximations 
until satisfactory accuracy results. 

If the process is applied using f-digit arithmetic and if K.(A) ~ 10%, then after k 
iterations of iterative refinement the solution has approximately the smaller of t and k(t — q) 
correct digits. If the system is well-conditioned, one or two iterations will indicate that the 
solution is accurate. There is the possibility of significant improvement on ill-conditioned 
systems unless the matrix A is so ill-conditioned that K,.(A) > 10’. In that situation, 
increased precision should be used for the calculations. 


Iterative Refinement 


To approximate the solution to the linear system Ax = b: 


INPUT the number of equations and unknowns n; the entries a;;, 1 < i, j < n of the 
matrix A; the entries b;, 1 < i < n of b; the maximum number of iterations N; tolerance 
TOL; number of digits of precision f. 


OUTPUT | the approximation xx = (xx;,...,xx,)' or a message that the number of itera- 
tions was exceeded, and an approximation COND to K,.(A). 


Step 0 Solve the system Ax = b for x,,...,X, by Gaussian elimination saving the 
multipliers mj, j =i+1,i+2,...,n,i=1,2,...,2— 1 and noting row 
interchanges. 


Step 1 Setk=1. 
Step 2. While (k < N) do Steps 3-9. 
Step 3 Fori=1,2,...,n (Calculate r.) 


n 
set 7; = bj — ) ij jX;- 
j=l 


(Perform the computations in double-precision arithmetic.) 


Step 4 Solve the linear system Ay = r by using Gaussian elimination in the same 
order as in Step 0. 


Step 5 Fori=1,...,n set xx; = x; + yj. 
l¥ loo 


Step 6 Ifk =1 then set COND = 10’. 
I|XXlo0 
Step 7 If ||x — xx|loo < TOL then OUTPUT (xx); 
OUTPUT (COND); 
(The procedure was successful.) 
STOP. 


Step 8 Setk=k+1. 


Step 9 Fori=1,...,nset x; = xx;. 
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Step 10 OUTPUT (‘Maximum number of iterations exceeded’); 
OUTPUT (COND); 
(The procedure was unsuccessful.) 
STOP. | 


If t-digit arithmetic is used, a recommended stopping procedure in Step 7 is to iterate 
until |y| < 10~', for each i = 1,2,...,n. 


Illustration In our earlier illustration we found the approximation to the linear system 


3.3330 15920 —10.333 xy 15913 
2.2220 16.710 9.6120 Xo |=] 28.544 
1.5611 5.1791 = 1.6852 x3 8.4254 


using five-digit arithmetic and Gaussian elimination, to be 
x = (1.2001, 0.99991, 0.92538)’ 

and the solution to Ay = r") to be 

x‘? = (—0.20008, 8.9987 x 107°, 0.074607)’. 
By Step 5 in this algorithm, 

x =x + ¥ = (1.0000, 1.0000, 0.99999)’, 
and the actual error in this approximation is 

IIx — KX? |p = 1x 107. 


Using the suggested stopping technique for the algorithm, we compute r® = b — Ax? 
and solve the system Ay? = r, which gives 


y® = (1.5002 x 10~°, 2.0951 x 107°, 1.0000 x 107%)’. 
Since ||¥ ||, < 10>, we conclude that 
x9) = x° + ¥° = (1.0000, 1.0000, 1.0000)’ 


is sufficiently accurate, which is certainly correct. a 


Throughout this section it has been assumed that in the linear system Ax = b, A and b 
can be represented exactly. Realistically, the entries a;; and b; will be altered or perturbed 
by an amount da;; and 6;, causing the linear system 


(A+ 5A)x = b + 5b 


to be solved in place of Ax = b. Normally, if ||SA|| and ||Sb|| are small (on the order of 
10~‘), the t-digit arithmetic should yield a solution x for which ||x — X|| is correspondingly 
small. However, in the case of ill-conditioned systems, we have seen that even if A and b are 
represented exactly, rounding errors can cause ||x — x|| to be large. The following theorem 
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Theorem 7.29 


James Hardy Wilkinson 
(1919-1986) is best known for 
his extensive work in numerical 
methods for solving linear 
equations and eigenvalues 
problems. He also developed the 
technique of backward error 
analysis. 


Iterative Techniques in Matrix Algebra 


relates the perturbations of linear systems to the condition number of a matrix. The proof 
of this result can be found in [Or2], p. 33. 


Suppose A is nonsingular and 


|SAl]| < ——_. 
|All 


The solution x to (A + 6A)k = b+ db approximates the solution x of Ax = b with the error 
estimate 


(7.25) 


Ix—xll_ K(@)IAl (oe + mr) 
Ix] ~~ All -— K)N64]] \ Ib MAT 


The estimate in inequality (7.25) states that if the matrix A is well-conditioned (that 
is, K(A) is not too large), then small changes in A and b produce correspondingly small 
changes in the solution x. If, on the other hand, A is ill-conditioned, then small changes in 
A and b may produce large changes in x. 

The theorem is independent of the particular numerical procedure used to solve Ax = b. 
It can be shown, by means of a backward error analysis (see [Wil1] or [Wil2]), that if Gauss- 
ian elimination with pivoting is used to solve Ax = b in f-digit arithmetic, the numerical 
solution x is the actual solution of a linear system: 


(A +6A)X=b, where ||SAlo < f(n)10'~ max lay? |. 
ty, 


for some function f(m). Wilkinson found that in practice f(n) * n and, at worst, f() < 
1.01(n? + 3n?). 


EXERCISE SET 7.5 


1. 


Compute the condition numbers of the following matrices relative to || - ||. 
1 ol 3.9 1. 
2 3 

. E | ss Ee | 
3 4 

: 1 2 d 1.003 58.09 

. 1.00001 2 ' 5.550 321.8 

Compute the condition numbers of the following matrices relative to || - ||. 
0.03 58.9 b | 58.9 0.03 
5.31 —6.10 —6.10 5.31 
1 -1 -1l 0.04 0.01 —0.01 

c. 0 1 -1 d. 0.2 O05 —0.2 
0 0 -1 1 2 4 


The following linear systems Ax = b have x as the actual solution and x as an approximate solution. 
Using the results of Exercise 1, compute 


||b — AX] oc 


Ix —Xlloo and Kx(A) 
’ * |Alloo 
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. Cae b. 39x) + 1.6% =5.5, 
’ : - 6.8x, + 2.9x) = 9.7, 
gre 168” x = (1,1), 
1 A" x = (0.98, 1.1)’. 
Seal oa) s 
(7-<) 
% = (0.142, —0.166)'. 
c. xX, + 2x) = 3, d. = 1.003x,; + 58.09x. = 68.12, 
1.0001x, + 2x2 = 3.0001, 5.550x, + 321.8x2. = 377.3, 
x= (1,1), x = (10, 1)’, 
X = (0.96, 1.02). x = (—10,1). 


4. The following linear systems Ax = b have x as the actual solution and x as an approximate solution. 
Using the results of Exercise 2, compute 


es Ib — AX oc 
Ix — Xl]. and K.(A)————. 
Alloo 
a. 0.03x; + 58.9x2 = 59.2, b.  58.9x; + 0.03x2 = 59.2, 
5.31x, — 6.10x2 = 47.0, —6.10x; + 5.31x2 = 47.0, 
x = (10,1), x = (1, 10)’, 
XK = (30.0, 0.990)’. K = (1.02, 9.98)’. 
Cc. Xi -— X12 - B= 21, d. 0.04x; + 0.01x2 = 0.01x3 = 0.06, 
xX. - HB = 0, 0.2x; + 0.5x2 = 0.2x3 = 0.3, 
—XxX=7. Xi + 2x. + 4x3 = 11, 
x = (0,-2,—7)', x = (1.827586, 0.6551724, 1.965517)’, 
x = (—0.1, —3.15, —3.14)’. x = (1.8, 0.64, 1.9)’. 


5. (i) Use Gaussian elimination and three-digit rounding arithmetic to approximate the solutions to the 
following linear systems. (ii) Then use one iteration of iterative refinement to improve the approxi- 
mation, and compare the approximations to the actual solutions. 


a. 0.03x; + 58.9x. = 59.2, 
5.31x,; — 6.10x2 = 47.0. 
Actual solution (10, 1)’. 


b. 3.3330x, + 15920x2. + 10.333x3 = 7953, 

2.2220x, + 16.710x2 + 9.6120x3 = 0.965, 

—1.5611x, + 5.1792x) — 1.6855x3 = 2.714. 
Actual solution (1,0.5,—1)’. 


ce = 1.19x, + 2.11%. — 100x3 + x4 = 1.12, 

14.2x,; — 0.122x. + 12.2x3 — x4 = 3.44, 

100x> = 99.9x3 + xy = 2.15, 

15.3x; + 0.110x2 — 13.143 — x4 = 4.16. 
Actual solution (0.17682530, 0.01269269, —0.02065405, —1.18260870)’. 


d. 0 xy — eX + V2x3 — V3x4 = V1, 
wx, + ex) — ex, + oe = 0, 
V5x, — VOx. + x3 — V2 
x, + e7xX) _ J 7x3 + oH /2. 

Actual solution (0.78839378, —3.12541367, 0.16759660, 4.55700252)’. 


6. Repeat Exercise 5 using four-digit rounding arithmetic. 


I 
- 
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10. 


11. 


Iterative Techniques in Matrix Algebra 


The linear system 


1 2 x1 _ 3 
1.0001 2 x. | | 3.0001 


has solution (1, 1)’. Change A slightly to 


1 2 
0.9999 2 |’ 
and consider the linear system 


1 2 XxX] _ 3 
0.9999 2 X | | 3.0001 |” 


Compute the new solution using five-digit rounding arithmetic, and compare the actual error to the 
estimate (7.25). Is A ill-conditioned? 


The linear system Ax = b given by 


1 2, xX] _ 3 
1.00001 2 x. | | 3.00001 
has solution (1, 1)’. Use seven-digit rounding arithmetic to find the solution of the perturbed system 
1 2 x; | | 3.00001 
1.000011 2 x | | 3.00003 |’ 
and compare the actual error to the estimate (7.25). Is A ill-conditioned? 
Show that if B is singular, then 


1 ASI 
K(A) ~ —IIAII 
[Hint: There exists a vector with ||x|| = 1, such that Bx = 0. Derive the estimate using ||Ax|| > 
xl] / ATI] 
Using Exercise 9, estimate the condition numbers for the following matrices: 
1 2 b 3.9 1.6 
* | 1.0001 2 " | 68 2.9 
The n x n Hilbert matrix H™ (see page 512) defined by 
1 
Py => ., |lsij<n, 
i+j-1 


is an ill-conditioned matrix that arises in solving the normal equations for the coefficients of the 
least-squares polynomial (see Example | of Section 8.2). 


a. Show that 


16 —120 240 —140 

—120 1200 —2700 1680 
240 —2700 6480 —4200 |’ 

—140 1680 —4200 2800 


Ee 


and compute K,.(H). 
b. Show that 


25 —300 1050 —1400 630 
—300 4800  —18900 26880 —12600 
[HO]! = 1050 —18900 79380 —117600 56700 |, 


—1400 26880 —117600 179200 —88200 
630 —12600 56700 —88200 44100 


and compute K,,(H). 
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c. Solve the linear system 


H® = 


Ke OCre 


using five-digit rounding arithmetic, and compare the actual error to that estimated in (7.25). 


12. Use four-digit rounding arithmetic to compute the inverse H—! of the 3 x 3 Hilbert matrix H, and 
then compute H = (H~')—!. Determine ||H — H]|.o. 


| a 7.6 The Conjugate Gradient Method 


The conjugate gradient method of Hestenes and Stiefel [HS] was originally developed as 
a direct method designed to solve an n x n positive definite linear system. As a direct 
method it is generally inferior to Gaussian elimination with pivoting. Both methods require 
n steps to determine a solution, and the steps of the conjugate gradient method are more 
computationally expensive than those of Gaussian elimination. 
Magnus Hestenes (1906-1991) However, the conjugate gradient method is useful when employed as an iterative ap- 
and Eduard Steifel (1907— proximation method for solving large sparse systems with nonzero entries occurring in 
1998) published the original predictable patterns. These problems frequently arise in the solution of boundary-value 
paper on the conjugate gradient + oh1ems, When the matrix has been preconditioned to make the calculations more effec- 
tive, good results are obtained in only about ,/n iterations. Employed in this way, the method 
Andkdegatccinipiel is preferred over Gaussian elimination and the previously-discussed iterative methods. 
UCLA. Throughout this section we assume that the matrix A is positive definite. We will use 
the inner product notation 


method in 1952 while working at 
the Institute for Numerical 


(x,y) =x’y, (7.26) 


where x and y are n-dimensional vectors. We will also need some additional standard results 
from linear algebra. A review of this material is found in Section 9.1. 
The next result follows easily from the properties of transposes (see Exercise 12). 


Theorem 7.30 For any vectors x, y, and z and any real number a, we have 


(a) (x,y) = (y, x); (b) (ax, y) = (x,ay) = a(x, y); 

(c) (x+2Z,y) = (x,y) + (ZY); (d) (x,x) 2 0; 

(e) (x,x) = Oif and only ifx = 0. | 
When A is positive definite, (x,Ax) = x’Ax > 0 unless x = 0. Also, since A is 


symmetric, we have x'Ay = x‘A'y = (Ax)'y, so in addition to the results in Theorem 7.30, 
we have for each x and y, 


(x, Ay) = (Ax)'y = x'A’y = x'Ay = (Ax, y). (7.27) 


The following result is a basic tool in the development of the conjugate gradient method. 


Theorem 7.31 The vector x* is a solution to the positive definite linear system Ax = b if and only if x* 
produces the minimal value of 


g(x) = (x, Ax) — 2(x,b). a 
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Proof Let x and v ¥ 0 be fixed vectors and ¢ a real number variable. We have 
g(x + tv) = (x + tv, Ax + tAv) — 2(x + tv, b) 
= (x, Ax) + t(v,Ax) + t(x,Av) + f7(v, Av) — 2(x, b) — 2t(v, b) 
= (x, Ax) — 2(x, b) + 2t(v, Ax) — 2t(v, b) + 2° (v, Av), 
so 
g(x + tv) = g(x) — 2t(v,b — Ax) + t*(v, Av). (7.28) 
With x and v fixed we can define the quadratic function h in t by 
h(t) = g(x + tv). 
Then / assumes a minimal value when h’(t) = 0, because its ft? coefficient, (v, Av), is 
positive. Because 
h'(t) = —2(v, b — Ax) + 2t(v, Av), 
the minimum occurs when 
(v, b — Ax) 
(v, Av) 


t= 


> 


and, from Equation (7.28), 
h(t) = g(x + tv) 
= g(x) — 27(v,b — Ax) + 7° (v, Av) 


_ (v, b — Ax) (v,b — Ax) \” 

= g(x) 2 (v, Av) (v, b Ax) + (2) (v, Av) 
Silat (v, b — Ax)? 

~ BU Ay) 


So for any vector v 4 0, we have g(x + fv) < g(x) unless (v, b — Ax) = 0, in which case 
g(x) = g(x + tv). This is the basic result we need to prove Theorem 7.31. 

Suppose x* satisfies Ax* = b. Then (v, b— Ax*) = 0 for any vector v, and g(x) cannot 
be made any smaller than g(x*). Thus, x* minimizes g. 

On the other hand, suppose that x* is a vector that minimizes g. Then for any vector v, 
we have g(x* + tv) > g(x*). Thus, (v, b — Ax*) = 0. This implies that b — Ax* = 0 and, 
consequently, that Ax* = b. m4 


To begin the conjugate gradient method, we choose x, an approximate solution to 
Ax* = b, and v ¥ 0, which gives a search direction in which to move away from x to 
improve the approximation. Let r = b — Ax be the residual vector associated with x and 


_ (v,b a= Ax) _ (v, r) 

~ (v,Av) — (v, Av) 
Ifr A 0 and if v and r are not orthogonal, then x + tv gives a smaller value for g than g(x) 
and is presumably closer to x* than is x. This suggests the following method. 


Let x be an initial approximation to x*, and let v") 4 0 be an initial search direction. 
For k = 1,2,3,..., we compute 


(vb — Ax®D) 
(v, Av&) 


x) = xD 4 py® 


= 


c) 
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and choose a new search direction v“*+". The object is to make this selection so that the 
sequence of approximations {x} converges rapidly to x*. 

To choose the search directions, we view g as a function of the components of x = 
(x1,X2,...,X,)'. Thus, 


8(%1,X25--- Xn) = (AX) — 2(x,b) = SOO aijxixy — 2 D> wid. 
i=l 


i=1 j=l 


Taking partial derivatives with respect to the component variables x; gives 
OB ph 23 a 4%; — 2b 
ax, = kiXxi k> 
which is the kth component of the vector 2(Ax — b). Therefore, the gradient of g is 


Va(x) = (720s. 98 og), oe a) = 2(Ax — b) = —2r, 
Ox] Ox2 OXn 


where the vector r is the residual vector for x. 

From multivariable calculus, we know that the direction of greatest decrease in the 
value of g(x) is the direction given by —V g(x); that is, in the direction of the residual r. 
The method that chooses 


yet) — pO — p— Ax 


is called the method of steepest descent. Although we will see in Section 10.4 that this 
method has merit for nonlinear systems and optimization problems, it is not used for linear 
systems because of slow convergence. 

An alternative approach uses a set of nonzero direction vectors {v",...,v} that 
satisfy 


(v), Av) =0, if iFj. 


This is called an A-orthogonality condition, and the set of vectors {v",...,v} is said 
to be A-orthogonal. It is not difficult to show that a set of A-orthogonal vectors associated 
with the positive definite matrix A is linearly independent. (See Exercise 13(a).) This set of 
search directions gives 


(vb — Ax@—D) _ (v pk—D) 
(vO, Ay) ~ (v®, Av) 


—— 


and x® = x®@—-) + py, 

The following theorem shows that this choice of search directions gives convergence 
in at most n-steps, so as a direct method it produces the exact solution, assuming that the 
arithmetic is exact. 


Theorem 7.32 Let {v",...,v} be an A-orthogonal set of nonzero vectors associated with the positive 
definite matrix A, and let x be arbitrary. Define 


(vi, b = Ax?) ® — yh-D yp y® 
t= (w®, Av) and x’ =x +hVv, 


fork = 1,2,...,n. Then, assuming exact arithmetic, Ax” = b. a 
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Proof Since, for each k = 1,2,...,n, x =x*-) + 4v, we have 
Ax™ = Ax) 4 pAv™ 


= (Ax? +4 4, AVY) + 4,Av™ 


= Ax + Av 4 pAv® +... +4,Av™. 
Subtracting b from this result yields 
Ax — b= Ax — b+ HAv + pAv® +--+ Av”. 


We now take the inner product of both sides with the vector v and use the properties of 
inner products and the fact that A is symmetric to obtain 


(Ax® — bv) = (Ax — b, v) + £1 (Av, v) 4... + ty (Av, v®) 
= (Ax — pv) + 4) (v) Av) +--+ tp(v™, Av). 
The A-orthogonality property gives, for each k, 
(Ax™ — b,v™) = (Ax — b,v™) + Riv, Av). (7.29) 
However t.(v, Av) = (vb — Ax") 50 
tev, Av) = (vb — Ax 4+ Ax® — Ax® 4... — Ax? 4 Ax?) — Ax@D) 
= (vb — Ax) + (v®,Ax — Ax) 4.0. 4 (W, Ax@ — Ax&D), 
But for any i, 
x9 =xXFD 4 pv and Ax® = Ax? + tAv®, 
sO 
Ax®-) — Ax® = —4Ay®, 
Thus 
tev, Av) = (vb — Ax) — nv, Av) =. = av, AvV® 2), 
Because of the A-orthogonality, (v“, Av) = 0, for i 4 k, so 
(v Av) 4 = (vb — Ax). 
From Eq.(7.29), 
(Ax —b,v®) = (Ax — bv) + Wb — Ax) 
= (Ax — b,v) + (b— Ax, v) 
= (Ax —b,v) — (Ax —b,v) = 0. 


Hence the vector Ax“) — b is orthogonal to the A-orthogonal set of vectors {v,...,v}. 
From this, it follows (see Exercise 13(b)) that Ax” — b = 0, so Ax” = b. 8 
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Example 1 The linear system 


4x) + 3x2 = 24, 
3x, + 4x2 — x3 = 30, 
— x» + 4x3 = —24 


has the exact solution x* = (3,4, —5)’. Show that the procedure described in Theorem 7.32 
with x = (0,0, 0)! produces this exact solution after three iterations. 


Solution We established in Example 2 of Section 7.4 that the coefficient matrix 


4 3 0 
A=| 3 4 -1 
0 -l 4 


of this system is positive definite. Let v") = (1,0,0)', v? = (—3/4,1,0)', and v = 
(—3/7,4/7, 1). Then 


4. 3 -; 
(v), Av) = vtAy® = (1,0,0)] 3 4 1 1 | =0, 
0 -1 0 
3 
4 3 0 =7 
(Vv), Av) = (1,0,0)} 3 4 -1 | 0, 
0-1 4 1 
and 
3 
: 4 3 0 -7 
(v,av®) = (—2.1,0) 3 4 -1 + | =0. 
0-1 4 1 


Hence {v"), v2), v} is an A-orthogonal set. 
Applying the iterations described in Theorem 7.22 for A with x© = (0,0,0)' and 
b = (24, 30, —24)’ gives 


r =b— Ax = b = (24, 30, —24)', 


so 


24 
(vV) 7) = yO = 240 (yw Av) = 4, and t= == 6. 


Hence 
x) = x + toy = (0,0,0)! + 6(1,0, 0)! = (6,0, 0)". 
Continuing, we have 


@) yy 12 48 
YO = b— Ax” = (0,12,-24)'; 4 = = a = a? 
r x’ = (0,12,—24)5 ty (v2,Av®@) 7/47 


48 ( 3 ? 6 48 \' 
2 —yO4 py — (6.0.0 + —(/—=.1.0) ={=.—.0) : 
x x’ + tv (6,0,0)' + = qe ie : 


ca a (0 ‘i -=) (vr) —120/7 _ 


Ss h= = =—5; 


(v3), Av®)) 24/7 


dy 
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and 
6 48 \' 34 \' 
(3) = (2) t (3) — -,— 0) —5 —_,-,1 = 3 4 —5 _ 
x x + hv (5.0) +¢ ( a> (3,4, —5) 
Since we applied the technique n = 3 times, this must be the actual solution. a 


Before discussing how to determine the A-orthogonal set, we will continue the devel- 
opment. The use of an A-orthogonal set {v"),...,v’} of direction vectors gives what is 
called a conjugate direction method. The following theorem shows the orthogonality of the 
residual vectors r“ and the direction vectors v”). A proof of this result using mathematical 
induction is considered in Exercise 14. 


The residual vectors r“, where k = 1,2,...,n, fora conjugate direction method, satisfy 
the equations 
(rv) =0, foreach j= 1,2,...,k. = 
The conjugate gradient method of Hestenes and Stiefel chooses the search directions 
{v} during the iterative process so that the residual vectors {r“} are mutually orthogonal. 
To construct the direction vectors {v), v), . . .} and the approximations {x"),x®,.. .}, we 
start with an initial approximation x and use the steepest descent direction r = b—Ax© 
as the first search direction v"). 


Assume that the conjugate directions v,,..., v“~)) and the approximations x"), .. ., 
x“—D have been computed with 


xFD = x) 4 og pyERD, 
where 
(v, Av?) =0 and (rr) =0, for iFj. 


If x*— js the solution to Ax = b, we are done. Otherwise, r“—)) = b — Ax“*—) ¥ 0 and 
Theorem 7.33 implies that (r“—), v) = 0, for each i = 1,2,...,k — 1. 
We use r—) to generate v™ by setting 


vO = pb D4 sg, pyED, 
We want to choose s;_; so that 
(v&—) Av) =0. 
Since 
Av® = Ar®) + 5, Ave) 
and 
(v&—D Ay) = (v@—D Ar®&)) ate Sp_1(vF—D AyD), 
we will have (v“-), Av) = 0 when 


(v&—D Ar&)) 
(v&-D, Av&-D) : 


Sk-1 = 
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It can also be shown that with this choice of s,_; we have (v,Av) = 0, for each 
i= 1,2,...,k — 2 (see [Lu], p. 245). Thus {v,... v“} is an A-orthogonal set. 
Having chosen v, we compute 


(v® r&—D) PED 4 5, yD pk-D) 
k= => 


(vy), Av) (vy : Ay®)) 
(r@-D p&D) (v&-D kD) 
~ ~(y®, Av®) F Sk (v®, Av®) 


By Theorem 7.33, (v“—)), r&-)) = 0, so 


(r@—-D p&D) 


th = “WO, Av®) (7.30) 
Thus 
x) = x6) 4 py, 
To compute r“), we multiply by A and subtract b to obtain 
Ax® —b = Ax®—) — b+ 4Av® 
or 
r® = r®) _ pay, 
This gives 
(r® r®) = (r&D -®) a % (Av r®) Sof (r® Av). 
Further, from Eq. (7.30), 
(r&—-D p&D)y = tev, Av®), 
so 
(vw) Ar) (r, Ay) (1/t,) (re, r) (r, r) 
k= WO, Av®) (vw, Av®) — (1/m) (rE) ERD ED)” 
In summary, we have 
r =p—Ax®. yO 2,0. 
and, fork = 1,2,...,n, 
= a x9 x®) 4 py, pPOar®) — pAv, y= ey 
and 


veTD — p® 4 gy, (7.31) 
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Preconditioning 


Rather than presenting an algorithm for the conjugate gradient method using these formulas, 
we extend the method to include preconditioning. If the matrix A is ill-conditioned, the 
conjugate gradient method is highly susceptible to rounding errors. So, although the exact 
answer should be obtained in n steps, this is not usually the case. As a direct method the 
conjugate gradient method is not as good as Gaussian elimination with pivoting. The main 
use of the conjugate gradient method is as an iterative method applied to a better-conditioned 
system. In this case an acceptable approximate solution is often obtained in about ./n steps. 

When preconditioning is used, the conjugate gradient method is not applied directly 
to the matrix A but to another positive definite matrix that a smaller condition number. We 
need to do this in such a way that once the solution to this new system is found it will be 
easy to obtain the solution to the original system. The expectation is that this will reduce 
the rounding error when the method is applied. To maintain the positive definiteness of the 
resulting matrix, we need to multiply on each side by a nonsingular matrix. We will denote 
this matrix by C~!, and consider 


A=jac Ac 'Y, 


with the hope that A has a lower condition number than A. To simplify the notation, we 
use the matrix notation C~' = (C oe Later in the section we will see a reasonable way to 


select C, but first we will consider the conjugate applied to A. 
Consider the linear system 


Ax =b, 
where X — C'x and b = C~'b. Then 
Ax = (C~'AC™')(C'x) = C7!Ax. 


Thus, we could solve Ax = b for X and then obtain x by multiplying by C~'. However, 
instead of rewriting equations (7.31) using ™™, 1, %, ¥, and 5, we incorporate the 
preconditioning implicitly. 

Since 


6) = Oly, 
we have 
r® = b— AX = Cb — (C7!AC~Y)Cx® = CH! — Ax) = CH ®, 
Let = Cty and w = C~!'r™, Then 


(F®, r) (Co'r®, co lr®) 


k= (F&-D, FED) = (C-1pG=D, C-Tp Dy’ 
0) 
ZA a) a8 
kD, wD)” ~ 
Thus 
(FE-D, r&—)) (Co 'r&D, Colr&D) (wk-), wk-D) 


(V® Ay) (Cty, C!AC*Cty®) ~ (Cty), C!Av) 
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and, since 
(Cv, C7Ay®) = [Cv ]'ClAy® 


= [voc Ay = [vay = (v Ay), 


we have 
(wk-), wi Dy 

ik = WO Ave) (7.33) 

Further, 
KC) =K—DLEI, so Cx = Cx®) 4 HEC VO 

and 

xX = xX) 4 Rv, (7.34) 
Continuing, 

7 — 7) _7,A9®, 
SO 

Cor® = co r®©) ~ 407 1Ac73®, r® =r®) _— RAC Cy, 

and 

r® =r©) _ZAy®, (7.35) 
Finally, 

VO) = FO 459 and Cy!) = C1r® 4.5,.C'v®, 
SO 
vet) = CC® + yy = Ct Ww® + Rv, (7.36) 


The preconditioned conjugate gradient method is based on using equations (7.32)— 
(7.36) in the order (7.33), (7.34), (7.35), (7.32), and (7.36). Algorithm 7.5 implements this 
procedure. 


Preconditioned Conjugate Gradient Method 
To solve Ax = b given the preconditioning matrix C~! and the initial approximation x: 


INPUT the number of equations and unknowns n; the entries a;;, 1 < i,j < n of the 
matrix A; the entries b;, 1 < j < n of the vector b; the entries y;;, 1 < i, j < n of the 
preconditioning matrix C~', the entries x;, 1 < i < n of the initial approximation x = x, 
the maximum number of iterations N; tolerance TOL. 


OUTPUT _ the approximate solution x,,...x, and the residual r),...7, or a message that 
the number of iterations was exceeded. 
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Step 1 Setr = b — Ax; (Compute r©.) 
w = C_!r; (Note: w = w) 
v = C-'w; (Note: v = v') 

Step2 Setk=1. 

Step 3 While (k < N) do Steps 4-7. 


Step 4 If ||v|| < TOL, then 
OUTPUT (‘Solution vector’; x,,..., xn); 
OUTPUT (‘with residual’; r},...,7,); 
(The procedure was successful.) 


STOP 
Step 5 Set u = Av; (Note: u = Av) 
t= — id ; (Note: t = ty) 


Deja Yj 
x =x-+ tv; (Note: x =x) 
r=r-— tu; (Note: r =r) 
w = C"!r; (Note: w = w)) 
B= Dj w7. (Note: B = (w®, w®)) 


Step 6 If || < TOL then 
if ||r|| < TOL then 
OUTPUT(‘Solution vector’; x;,...,Xn); 
OUTPUT(‘with residual’; r},...,7,)3 
(The procedure was successful.) 
STOP 


Step 7 Sets = B/a; (s = sx) 
v= C'w+ sv; (Note: v = v&t)) 
a = B; (Update a.) 
k=k+1. 
Step 8 If(k >n) then 
OUTPUT (‘The maximum number of iterations was exceeded.’); 


(The procedure was unsuccessful.) 
STOP. = 


The next example illustrates the calculations for an elementary problem. 


Example 2 The linear system Ax = b given by 


4x, + 3x2 = 24, 
3x, + 4x2 — x3 = 30, 
— x2 + 4x3 = —24 


has solution (3,4, —5)’. Use the conjugate gradient method with x = (0,0,0)' and no 
preconditioning, that is, with C = C~! = J, to approximate the solution. 


Solution The solution was considered in Example 2 of Section 7.4 where the SOR method 
were used with a nearly optimal value of m = 1.25. 
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For the conjugate gradient method we start with 
r® = b—Ax® = b = (24, 30, -24)'; 
w=C!r® = (24,30, —24)'; 
v) = Cw = (24, 30, —24)'; 
a = (W, Ww) = 2052. 
We start the first iteration with k = 1. Then 
u = Ay"! = (186.0, 216.0, —126.0)'; 


a 
(vu) 
x) = x 4 pry = (3.525773196, 4.4072 16495, —3.525773196)'; 


r) =r — qu = (—3.32474227, —1.73195876, —5.48969072)'; 


i= = 0.1469072165; 


w= Clr) =r, 


B = (w,w) = 44.19029651; 
s= ae 0.02153523222: 
(64 
v? = Cw sv") = (—2.807896697, —1.085901793, —6.006536293)’. 


Set 
a = B = 44.19029651. 


For the second iteration we have 

u = Av” = (—14.48929217, —6.760760967, —22.94024338)'; 
ty = 0.2378157558; 

x = (2,858011121, 4.148971939, —4.954222164)'; 

r? = (0.121039698, —0.124143281, —0.034139402)’; 
w= Coy = r?): 
B = 0.03122766148; 
52 = 0.0007066633163; 

v® = (0.1190554504, —0.1249106480, —0.03838400086)’. 


Seta = 6 = 0.03122766148. 
The third iteration gives 


u = Av® = (0.1014898976, —0.1040922099, —0.0286253554)’; 
tz; = 1.192628008; 

x9 = (2,999999998, 4.000000002, —4.999999998)’; 

r® = (0.36 x 1078, 0.39 x 1078, —0.141 x 1078). 


Since x®) is nearly the exact solution, rounding error did not significantly effect the 
result. In Example 2 of Section 7.4, the SOR method with w = 1.25 required 14 iterations 
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for an accuracy of 10~’. It should be noted, however, that in this example, we are really 
comparing a direct method to iterative methods. a 


The next example illustrates the effect of preconditioning on a poorly conditioned 
matrix. In this example, we use D~'/? to represent the diagonal matrix whose entries are the 
reciprocals of the square roots of the diagonal entries of the coefficient matrix A. This is used 
as the preconditioner. Because the matrix A is positive definite we expect the eigenvalues 
of D~'/2AD~"/? to be close to 1, with the result that the condition number of this matrix 
will be small relative to the condition number of A. 


Use Maple to find the eigenvalues and condition number of the matrix 


0.2 0.1 1 1 0 
0.1 4 -1 1 —-1 
A=| 1 —l 60 0 —2 
1 1 0 8 4 
0 == —2 4 700 


and compare these with the eigenvalues and condition number of the preconditioned matrix 
D7'AD-"/2. 


Solution We first need to load the LinearAlgebra package and then enter the matrix. 


with(LinearAlgebra): 
A := Matrix([[0.2, 0.1, 1, 1,0], [0.1,4, —1, 1, —1], [1, —1, 60, 0, —2], 
[1, 1,0, 8,4], [0, —1, —2,4, 700]]) 
To determine the preconditioned matrix we first need the diagonal matrix, which being 
symmetric is also its transpose. its diagonal entries are specified by 
1 1 1 1 1 
al = —==; a2 = —=; a3 = =; 64 = = 5 = 


V0.2 V/4.0° 60.0 4/3.0- 700.0 


and the preconditioning matrix is 
CI := Matrix({[a1, 0, 0, 0, 0], [0, a2, 0, 0, 0], [0, 0, a3, 0, 0], [0, 0, 0, a4, 0], [0, 0, 0, 0, a5]]) 


which Maple returns as 


2.23607 0 0 0 0 
0 .500000 0 0 0 
0 0 129099 0 0 
0 0 0 303093 0 
0 0 0 0 0.0377965 


The preconditioned matrix is 


AH := CI.A.Transpose(Cl) 
1.000002 0.1118035 0.2886744 0.7905693 0 


0.1118035 1 —0.0645495 0.1767765 —0.0188983 
0.2886744 —0.0645495 0.9999931 0 —0.00975898 
0.7905693 0.1767765 0 0.9999964 0.05345219 
0 —0.0188983 —0.00975898 0.05345219 1.000005 


The eigenvalues of A and AH are found with 


Eigenvalues(A); Eigenvalues(AH) 
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Maple gives these as 
Eigenvalues of A :700.031, 60.0284, 0.0570747, 8.33845, 3.74533 
Eigenvalues of AH :1.88052, 0.156370, 0.852686, 1.10159, 1.00884 
The condition numbers of A and AH in the /,, norm are found with 
ConditionNumber(A); ConditionNumber(AH) 
which Maple gives as 13961.7 for A and 16.1155 for AH. It is certainly true in this case that 


AH is better conditioned that the original matrix A. O 


The linear system Ax = b with 


0.2 0.1 1 1 0 1 
0.1 4 -1 1 -1 2 
A=} 1 —1 60 0 —2 and b=] 3 
1 1 0 8 4 4 
0 -1 —2 4 700 5 


has the solution 
x* = (7.859713071, 0.4229264082, —0.07359223906, —0.5406430164, 0.01062616286)’. 


Table 7.5 lists the results obtained by using the Jacobi, Gauss-Seidel, and SOR (with w = 
1.25) iterative methods applied to the system with A with a tolerance of 0.01, as well as 
those when the Conjugate Gradient method is applied both in its unpreconditioned form 
and using the preconditioning matrix described in Example 3. The preconditioned conjugate 
gradient method not only gives the most accurate approximations, it also uses the smallest 
number of iterations. 


Table 7.5 
Number 
Method of Iterations x Ix* — x |] 
Jacobi 49 (7.86277141, 0.42320802, —0.07348669, 0.00305834 
—0.53975964, 0.01062847)' 
Gauss-Seidel 15 (7.83525748, 0.42257868, —0.07319124, 0.02445559 
—0.53753055, 0.01060903)' 
SOR (@ = 1.25) 7 (7.85 152706, 0.42277371, —0.07348303, 0.00818607 
—0.53978369, 0.01062286)' 
Conjugate Gradient 5 (7.85341523, 0.42298677, —0.07347963, 0.00629785 
—0.53987920, 0.008628916)' 
Conjugate Gradient 4 (7.85968827, 0.42288329, —0.07359878, 0.000093 12 


(Preconditioned) 


—0.54063200, 0.01064344)' 


The preconditioned conjugate gradient method is often used in the solution of large 
linear systems in which the matrix is sparse and positive definite. These systems must be 
solved to approximate solutions to boundary-value problems in ordinary-differential equa- 
tions (Sections 11.3, 11.4, 11.5). The larger the system, the more impressive the conjugate 
gradient method becomes because it significantly reduces the number of iterations required. 
In these systems, the preconditioning matrix C is approximately equal to L in the Cholesky 
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factorization LL’ of A. Generally, small entries in A are ignored and Cholesky’s method is ap- 
plied to obtain what is called an incomplete LL’ factorization of A. Thus, C~'C~! ~ A7! and 
a good approximation is obtained. More information about the conjugate gradient method 
can be found in [Kelley]. 


EXERCISE SET 7.6 


1. 


The linear system 


1 5 
xi + 572 = 57° 
1 1 11 
pra a 


has solution (x;, x2)’ = (1/6, 1/7)’. 
a. Solve the linear system using Gaussian elimination with two-digit rounding arithmetic. 


b. Solve the linear system using the conjugate gradient method (C = C7! = J) with two-digit 
rounding arithmetic. 


c. Which method gives the better answer? 
d. Choose C~! = D~'/”, Does this choice improve the conjugate gradient method? 


The linear system 
O.1x, + 0.2x. = 0.3, 
0.2x,; + 113x) = 113.2 


has solution (x;, x2)’ = (1, 1)’. Repeat the directions for Exercise | on this linear system. 


The linear system 


a 
x) + get 33 = 
| 5 
git gar ge ao 
1 41 4 17 
gt get 5 ep 


has solution (1, —1, 1)’. 

a. Solve the linear system using Gaussian elimination with three-digit rounding arithmetic. 

b. — Solve the linear system using the conjugate gradient method with three-digit rounding arithmetic. 
c. Does pivoting improve the answer in (a)? 

d. Repeat part (b) using C~! = D~!/?. Does this improve the answer in (b)? 

Repeat Exercise 3 using single-precision arithmetic on a computer. 


Perform only two steps of the conjugate gradient method with C = C~! = J on each of the following 
linear systems. Compare the results in parts (b) and (c) to the results obtained in parts (b) and (c) of 
Exercise | of Section 7.3 and Exercise 1 of Section 7.4. 


a. 3x, -— M+ B= 1, b. 10x;}- x = 9: 
—xX, + 6x7 alr 2x3 = 0, —xXy + 10x = 2x3 = Te 
x + 2x9 + 7x3 = 4. = 2X + 10x3 = 6. 
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ce. 10x, + 5x2 = 6, da 444+ 2% -— 234+ % = -2, 

5x, + 10x. — 4x3 = 25, x, +4%.- 93 - xy =, 
— 4H54+83- 4 =-lI1, —xX}— %+5%34+ x4 =0, 
— %+34+5x,=-11. X—- H+ B4+3x,= 1. 

e 44+ 2+ 2+ x5 = 6, f. 4x, -— x — x4 = 0, 
Xp t3x.+ x3+ % = 6, —x, +4 -— 23 — X5 =5, 
Xp + %1+5%3- Xy- X5 = 6, — X2 + 4x3 — x =0, 

X— x3 + 4x4 = 6, Xx) + 4x4 —-— X5 = 6, 
xX} — xw24+ + 4x5 = 6. — X — xm +4%5-— x% = 2, 
— x3 — x5+ 4% = 6. 


6. Repeat Exercise 5 using C-! = D~!/?. 


7. Repeat Exercise 5 with TOL = 10-3 in the /,, norm. Compare the results in parts (b) and (c) to those 


obtained in Exercises 5 and 7 of Section 7.3 and Exercise 5 of Section 7.4. 


8. Repeat Exercise 7 using C-! = D~!/?. 


9. Approximate solutions to the following linear systems Ax = b to within 1079 in the /,, norm. 


(i) 
4, when j =i and i= 1,2,...,16, 


j=itlandi=1,2,3,5,6,7,9, 10, 11, 13, 14, 15, 
j=i-—1andi=2,3,4,6,7,8, 10, 11, 12, 14, 15, 16, 
j=it4andi=1,2,...,12, 
j=i-4andi=5,6,...,16, 


0, otherwise 
and 


b = (1.902207, 1.051143, 1.175689, 3.480083, 0.819600, —0.264419, 

— 0.412789, 1.175689, 0.913337, —0.150209, —0.264419, 1.051143, 

1.966694, 0.913337, 0.819600, 1.902207)’ 
(ii) 

4, when j =i and i= 1,2,...,25, 

1,2,3,4, 6,7, 8,9, 11, 12, 13, 14, 
16, 17, 18, 19, 21, 22, 23, 24, 
2,3,4,5,7, 8,9, 10, 12, 13, 14, 15, 
17, 18, 19, 20, 22, 23, 24, 25, 


jai tanai=| 


aij = 4—1, when jain tandi=| 


j=itSandi=1,2,...,20, 
j=i-Sandi=6,7,...,25, 
0, otherwise 


and 


b = (1,0, —1,0, 2, 1,0, —1,0, 2, 1,0, —1,0, 2, 1,0, —1,0, 2, 1,0, —1,0, 2)’ 
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10. 


11. 


12. 
13. 


14. 


15. 


Iterative Techniques in Matrix Algebra 


(iii) 
2i, when j=i and i=1,2,...,40, 


j=itlandi=1,2,...,39, 


4j= 4-1, when)” |. . 
jJ=i-landi=2,3,...,40, 


0, otherwise 


and b; = 1.5i — 6, for each i = 1,2,...,40 
a. Use the Jacobi method, b. Use the Gauss-Seidel method, 
ce. Use the SOR method with w = 1.3 in (i), @ = 1.2 in (11), and @ = 1.1 in (iii). 
d. Use the conjugate gradient method and preconditioning with C-! = D~'/?. 


Solve the linear system in Exercise 16(b) of Exercise Set 7.3 using the conjugate gradient method 
with C-! = J, 


Let 

fa ae a al 
= a a <b 6 @ 

a Oo a ag . ger oo Se 
LO “Oat. 4 0 0 oO -1 
ro 00 0 
000 0 

ON. Bie <i. 
Oe eo 


Form the 16 x 16 matrix A in partitioned form, 


Let b = (1, 2,3, 4,5, 6, 7, 8, 9, 0, 1,2, 3,4, 5, 6)’. 
a. Solve Ax = b using the conjugate gradient method with tolerance 0.05. 


1/2 


b. Solve Ax = b using the preconditioned conjugate gradient method with C7! = D7'!/* and 


tolerance 0.05. 


c. Is there any tolerance for which the methods of part (a) and part (b) require a different number 
of iterations? 


Use the transpose properties given in Theorem 6.14 on page 390 to prove Theorem 7.30. 


a. Show that an A-orthogonal set of nonzero vectors associated with a positive definite matrix is 
linearly independent. 


b. Show that if {v, v®,..., v"} is a set of A-orthogonal nonzero vectors in R and z'v = 0, for 
eachi = 1,2,...,n, thenz = 0. 

Prove Theorem 7.33 using mathematical induction as follows: 

a. Show that (r, v‘)) = 0. 

b. Assume that (r®, vy) = 0, for each k < J andj = 1,2,...,k, and show that this implies that 
(OF) yO = 0, for eachj = 1,2,...,/. 

c. Show that (r“*), vy") = 0. 


In Example 3 the eigenvalues were found for the matrix A and the conditioned matrix AH. Use these 
to determine the condition numbers of A and AH in the /, norm, and compare your results to those 
given with the Maple commands ConditionNumber(A,2) and ConditionNumber(AH,2). 
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| 7.7 Survey of Methods and Software 


In this chapter we studied iterative techniques to approximate the solution of linear systems. 
We began with the Jacobi method and the Gauss-Seidel method to introduce the iterative 
methods. Both methods require an arbitrary initial approximation x and generate a se- 
quence of vectors x“*) using an equation of the form 


x) — Tx + ¢, 


It was noted that the method will converge if and only if the spectral radius of the iteration 
matrix p(T) < 1, and the smaller the spectral radius, the faster the convergence. Analysis 
of the residual vectors of the Gauss-Seidel technique led to the SOR iterative method, which 


Aleksei Nikolaevich Krylov involves a parameter w to speed convergence. 
(1863-1945) worked in applied These iterative methods and modifications are used extensively in the solution of lin- 
mathematics, primarily in the ear systems that arise in the numerical solution of boundary value problems and partial 


SEL Dee ae differential equations (see Chapters 11 and 12). These systems are often very large, on the 


order of 10,000 equations in 10,000 unknowns, and are sparse with their nonzero entries in 
predictable positions. The iterative methods are also useful for other large sparse systems 
and are easily adapted for efficient use on parallel computers. 

Almost all commercial and public domain packages that contain iterative methods for 


problems, the acceleration of 

convergence of Fourier series, 
and various classical problems 
involving mechanical systems. 
During the early 1930s he was 


the Director of the Physics- the solution of a linear system of equations require a preconditioner to be used with the 
Mathematics Institute of the method. Faster convergence of iterative solvers is often achieved by using a preconditioner. 
Soviet Academy of Sciences. A preconditioner produces an equivalent system of equations that hopefully exhibits better 


convergence characteristics than the original system. The IMSL Library has a precondi- 
tioned conjugate gradient method, and the NAG Library has several subroutines for the 
iterative solution of linear systems. 

All of the subroutines are based on Krylov subspaces. Saad [Sa2] has a detailed de- 
scription of Krylov subspace methods. The packages LINPACK and LAPACK contain only 
direct methods for the solution of linear systems; however, the packages do contain many 
subroutines that are used by the iterative solvers. The public domain packages IML++, 
ITPACK, SLAP, and Templates, contain iterative methods. MATLAB contains several iter- 
ative methods that are also based on Krylov subspaces. 

The concepts of condition number and poorly conditioned matrices were introduced in 
Section 7.5. Many of the subroutines for solving a linear system or for factoring a matrix into 
an LU factorization include checks for ill-conditioned matrices and also give an estimate 
of the condition number. LAPACK has numerous routines that include the estimate of a 
condition number, as do the ISML and NAG libraries. 

LAPACK, LINPACK, the IMSL Library, and the NAG Library have subroutines that 
improve on a solution to a linear system that is poorly conditioned. The subroutines test 
the condition number and then use iterative refinement to obtain the most accurate solution 
possible given the precision of the computer. 

More information on the use of iterative methods for solving linear systems can be 
found in Varga [Var1], Young [Y], Hageman and Young [HY], and Axelsson [Ax]. Iterative 
methods for large sparse systems are discussed in Barrett et al [Barr], Hackbusch [Hac], 
Kelley [Kelley], and Saad [Sa2]. 
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Approximation Theory 


Introduction 


Hooke’s law states that when a force is applied to a spring constructed of uniform material, 
the length of the spring is a linear function of that force. We can write the linear function 
as F(l) = k(l — E), where F(J) represents the force required to stretch the spring / units, 
the constant E represents the length of the spring with no force applied, and the constant k 
is the spring constant. 


‘4 
144 
=: ol . 
_ 10 + r 
<= kI— E)= FO) 44 
‘= i eu 
2 4 6 F 


Suppose we want to determine the spring constant for a spring that has initial length 
5.3 in. We apply forces of 2, 4, and 6 |b to the spring and find that its length increases to 7.0, 
9.4, and 12.3 in., respectively. A quick examination shows that the points (0,5.3), (2, 7.0), 
(4,9.4), and (6, 12.3) do not quite lie in a straight line. Although we could use a random 
pair of these data points to approximate the spring constant, it would seem more reasonable 
to find the line that best approximates all the data points to determine the constant. This 
type of approximation will be considered in this chapter, and this spring application can be 
found in Exercise 7 of Section 8.1. 

Approximation theory involves two general types of problems. One problem arises 
when a function is given explicitly, but we wish to find a “simpler” type of function, 
such as a polynomial, to approximate values of the given function. The other problem in 
approximation theory is concerned with fitting functions to given data and finding the “best” 
function in a certain class to represent the data. 

Both problems have been touched upon in Chapter 3. The nth Taylor polynomial about 
the number xo is an excellent approximation to an (7 + 1)-times differentiable function f 
in a small neighborhood of xo. The Lagrange interpolating polynomials, or, more generally, 
osculatory polynomials, were discussed both as approximating polynomials and as poly- 
nomials to fit certain data. Cubic splines were also discussed in that chapter. In this chapter, 
limitations to these techniques are considered, and other avenues of approach are discussed. 


497 


0 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
med that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


498 


CHAPTER 8 o 


Approximation Theory 


| Sa 8.1 Discrete Least Squares Approximation 


Table 8.1 

Xj Ji Xj Ji 

1 13 6 8.8 

2 3.5 7 10.1 

3 42 8 12.5 

4 5.0 9 13.0 

5 7.0 10 15.6 
Figure 8.1 
Figure 8.2 


Consider the problem of estimating the values of a function at nontabulated points, given 
the experimental data in Table 8.1. 

Figure 8.1 shows a graph of the values in Table 8.1. From this graph, it appears that the 
actual relationship between x and y is linear. The likely reason that no line precisely fits the 
data is because of errors in the data. So it is unreasonable to require that the approximating 
function agree exactly with the data. In fact, such a function would introduce oscillations 
that were not originally present. For example, the graph of the ninth-degree interpolating 
polynomial shown in unconstrained mode for the data in Table 8.1 is obtained in Maple 
using the commands 


p := interp((1, 2,3,4,5, 6, 7, 8,9, 10], [1.3, 3.5, 4.2, 5.0, 7.0, 8.8, 10.1, 12.5, 13.0, 15.6], x): 
plot(p, x = 1..10) 


The plot obtained (with the data points added) is shown in Figure 8.2. 
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This polynomial is clearly a poor predictor of information between a number of the 
data points. A better approach would be to find the “best” (in some sense) approximating 
line, even if it does not agree precisely with the data at any point. 

Let a,x; + do denote the ith value on the approximating line and y,; be the ith given 
y-value. We assume throughout that the independent variables, the x;, are exact, it is the 
dependent variables, the y;, that are suspect. This is a reasonable assumption in most exper- 
imental situations. 

The problem of finding the equation of the best linear approximation in the absolute 
sense requires that values of ag and a; be found to minimize 


Eo0(do, 41) = max tlyi — (a,x; + ao)|}. 


This is commonly called a minimax problem and cannot be handled by elementary tech- 
niques. 

Another approach to determining the best linear approximation involves finding values 
of do and a, to minimize 


10 
E\ (do, a1) = a lyvi — (1x; + ao)]. 
i=l 
This quantity is called the absolute deviation. To minimize a function of two variables, we 
need to set its partial derivatives to zero and simultaneously solve the resulting equations. 
In the case of the absolute deviation, we need to find ag and a; with 


10 10 


0 0 
Yo bi = (ax; +.a0)| and 0 = — )° |y; — (aixj + a0)|. 
=1 


0= — 
dao 4 day 


The problem is that the absolute-value function is not differentiable at zero, and we might 
not be able to find solutions to this pair of equations. 


Linear Least Squares 


The least squares approach to this problem involves determining the best approximating 
line when the error involved is the sum of the squares of the differences between the y-values 
on the approximating line and the given y-values. Hence, constants ag and a; must be found 
that minimize the least squares error: 


10 
E3(ao, a1) = > [yi — (aux; + ao)’. 
i=l 
The least squares method is the most convenient procedure for determining best linear 
approximations, but there are also important theoretical considerations that favor it. The 
minimax approach generally assigns too much weight to a bit of data that is badly in 
error, whereas the absolute deviation method does not give sufficient weight to a point 
that is considerably out of line with the approximation. The least squares approach puts 
substantially more weight on a point that is out of line with the rest of the data, but will 
not permit that point to completely dominate the approximation. An additional reason for 
considering the least squares approach involves the study of the statistical distribution of 
error. (See [Lar], pp. 463-481.) 
The general problem of fitting the best least squares line to a collection of data 
{(x;, yi) #2, involves minimizing the total error, 
m 
E = Ey(ao,4\) = ey [yi — (aux; + ao)]’, 


i=l 
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The word normal as used here 
implies perpendicular. The 
normal equations are obtained by 
finding perpendicular directions 
to a multidimensional surface. 


Example 1 


Table 8.2 


Approximation Theory 


with respect to the parameters ap and a;. For a minimum to occur, we need both 


OE OE 
—=0 and —=0, 
dao a da, 
that is, 
a m 2 m 
0= 5d [oi = Gam = ao) = 2) [04 = ais = a0)(-1) 
i=1 i=l 
and 
a m > m 
0= 9a: [yi — (ax; + ao) | =2 XC; — aX; — ao)(—xi). 
tet i=l 


These equations simplify to the normal equations: 


m m m m m 


et in Vix ox 
ne i=1 = i=1 - = (8.1) 
m (> “) = (x s] 
i=1 i=1 
and 
m > xiy; a bee yi 
ge] isl isl (8.2) 


&)-E) 


Find the least squares line approximating the data in Table 8.1. 


Solution We first extend the table to include a and x;y; and sum the columns. This is shown 
in Table 8.2. 


Xj Vi x? XiVi P(x;) = 1.538x; — 0.360 
1 1.3 1 13 1.18 
2 3.5 4 7.0 72 
3 4.2 9 12.6 4.25 
4 5.0 16 20.0 5.79 
5 7.0 25 35.0 7.33 
6 8.8 36 52.8 8.87 
7 10.1 49 70.7 10.41 
8 12.5 64 100.0 11.94 
9 13.0 81 117.0 13.48 
10 15.6 100 156.0 15.02 
55 81.0 385 572.4 E =>.,0; — P(ai))* © 2.34 
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The normal equations (8.1) and (8.2) imply that 


1) - . 
ee 385(81) — 55(572.4) co G80 
10(385) — (55)? 


and 


__ 10(572.4) — 55(81) 


= = 1.538, 
“1 = 1085) — (55) 


so P(x) = 1.538x — 0.360. The graph of this line and the data points are shown in Fig- 
ure 8.3. The approximate values given by the least squares technique at the data points are 
in Table 8.2. a 


Figure 8.3 


y = 1.538% — 0.360 


Polynomial Least Squares 


The general problem of approximating a set of data, {(xj,y;) | i = 1,2,...,m}, with an 
algebraic polynomial 


P(X) = Anx" + An1x" | +++ + aix+ao, 


of degree n < m— 1, using the least squares procedure is handled similarly. We choose the 
constants do, a1, ..., d, to minimize the least squares error E = Ey(ao, a1,...,@n), where 


E=) (i — Pale)? 


i=1 
=) oy? - 2° Paley: + Pad)? 
i=1 i=1 i=1 
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As in the linear case, for E to be minimized it is necessary that 0E/da; = 0, for each 
j=0,1,...,n. Thus, for each 7, we must have 


dE m : n m oe 
0= 2D onl 2 ra Da, 
day i=l k=0 i=l 
This gives n + | normal equations in the n + 1 unknowns a;. These are 


m m 


ai yon =) ye. foreach = 0 oxan. (8.3) 
k=0 i=l 


i=l 
It is helpful to write the equations as follows: 


m m m m m 
0 l 2 0 
ao) o+ay xj +a) xp +a, > c=) Vi; » 
i=l i=l i=l i=l i=l 


m 


m m m m 

1 2 3 I 1 

ao) x +a) x +a) apt +a,)> Sa =) Vix; 5 
i=l i=l i=l i=l i=l 


m m m m 


m 
ao ) x) + ay ) at +a ) <- +++++a) ) a = ) yix}. 
i=l i=l i=l i=l i=l 


These normal equations have a unique solution provided that the x; are distinct (see 
Exercise 14). 


Example 2 Fit the data in Table 8.3 with the discrete least squares polynomial of degree at most 2. 


Solution For this problem, n = 2,m = 5, and the three normal equations are 


Table 8.3 Sag + 2.54; + 1.875a2 = 8.7680, 
ix; yj 2.5a9 + 1.875a, + 1.5625a2 = 5.4514, 
1 0 1.0000 1.875do + 1.5625a, + 1.3828a2 = 4.4015. 
2 0.25 1.2840 : ‘ : 

3 050 1.6487 To solve this system using Maple, we first define the equations 
4 0.75 2.1170 eq! := 5a0 + 2.5a1 + 1.875a2 = 8.7680: 

51.00 2.7183 eq2 := 2.5a0 + 1.875al1 + 1.5625a2 = 5.4514: 


eq3 := 1.875a0 + 1.5625a1 + 1.3828a2 = 4.4015 
and then solve the system with 
solve({eq1, eq2, eq3}, {a0, al, a2}) 
This gives 
{ao = 1.005075519, a, = 0.8646758482, az = .8431641518} 
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Thus the least squares polynomial of degree 2 fitting the data in Table 8.3 is 


P»(x) = 1.0051 + 0.86468x + 0.84316x°, 


whose graph is shown in Figure 8.4. At the given values of x; we have the approximations 
shown in Table 8.4. 


Figure 8.4 
y = 1.0051 + 0.86468x + 0.84316x? 
Table 8.4 F 1 2 3 4 5 
Xj 0 0.25 0.50 0.75 1.00 
Yi 1.0000 1.2840 1.6487 2.1170 2.7183 
P(x) 1.0051 1.2740 1.6482 2.1279 2.7129 
y; — P(x) —0.0051 0.0100 0.0004 —0.0109 0.0054 


The total error, 


5 
E=) (i — P(x)? = 2.74 x 10, 


i=l 


is the least that can be obtained by using a polynomial of degree at most 2. a 


Maple has a function called LinearFit within the Statistics package which can be used 
to compute the discrete least squares approximations. To compute the approximation in 
Example 2 we first load the package and define the data 


with(Statistics): xvals := Vector((0, 0.25, 0.5, 0.75, 1]): yvals := Vector((1, 1.284, 1.6487, 
2.117, 2.7183]): 


To define the least squares polynomial for this data we enter the command 


P :=x > LinearFit((1, x, x’], xvals, yvals, x): P(x) 
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Maple returns a result which rounded to 5 decimal places is 
1.00514 + 0.86418x + 0.84366x7 
The approximation at a specific value, for example at x = 1.7, is found with P(1.7) 
4.91242 


At times it is appropriate to assume that the data are exponentially related. This requires 
the approximating function to be of the form 


y = be™ (8.4) 
or 

y = bx", (8.5) 
for some constants a and b. The difficulty with applying the least squares procedure in a 


situation of this type comes from attempting to minimize 


E=) (i — be)’, in the case of Eq. (8.4), 
i=1 


or 
E =) (yi — bx#)*, _ in the case of Eq. (8.5). 
i=l 
The normal equations associated with these procedures are obtained from either 
dE ~ aX} AX; 
0=— = Beets (-e™) 
and 
dE “ 3 
(= = =9 XC; — be“i)(—bx;e“'), in the case of Eq. (8.4); 
da = 
or 
dE m oe 
0= = 220% — bx*)(—x") 
and 
JE m : 
0=—=2 XC; — bx#)(—b(in x;)x#), in the case of Eq. (8.5). 
da i=l 


No exact solution to either of these systems in a and b can generally be found. 
The method that is commonly used when the data are suspected to be exponentially 
related is to consider the logarithm of the approximating equation: 


Iny =Inb+ ax, in the case of Eq. (8.4), 
and 


Iny=Inb+alnx, _ inthe case of Eq. (8.5). 
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In either case, a linear problem now appears, and solutions for In b and a can be obtained 
by appropriately modifying the normal equations (8.1) and (8.2). 

However, the approximation obtained in this manner is vot the least squares approxima- 
tion for the original problem, and this approximation can in some cases differ significantly 
from the least squares approximation to the original problem. The application in Exer- 
cise 13 describes such a problem. This application will be reconsidered as Exercise 11 in 
Section 10.3, where the exact solution to the exponential least squares problem is approxi- 
mated by using methods suitable for solving nonlinear systems of equations. 


Illustration Consider the collection of data in the first three columns of Table 8.5. 


Table 8.5 i Xj Vj In yi x Xj In Ji 
1 1.00 5.10 1.629 1.0000 1.629 
2 1.25 5.79 1.756 1.5625 2.195 
3 1.50 6.53 1.876 2.2500 2.814 
4 1.75 7A5 2.008 3.0625 3.514 
5 2.00 8.46 2.135 4.0000 4.270 
7.50 9.404 11.875 14.422 
If x; is graphed with In y;, the data appear to have a linear relation, so it is reasonable to 
assume an approximation of the form 
y= be“, whichimplies that Iny =Inb+ ax. 
Extending the table and summing the appropriate columns gives the remaining data in 
Table 8.5. 
Using the normal equations (8.1) and (8.2), 
5) (14.422) — (7.5) (9.404 
_ (5)(14.422) ~ (7.5)(9.404) _ 4 coc 
(5)(11.875) — (7.5)? 
and 
_ (11.875) (9.404) — (14.422)(7.5) _ 1122 
i (5)(11.875) — (7.5)? oe 
With In b = 1.122 we have b = e!!* = 3.071, and the approximation assumes the form 
pase 
At the data points this gives the values in Table 8.6. (See Figure 8.5.) 
Table 8.6 i X; yj 3.071 03056: \yi _ 3.071 693056: 
1 1.00 5.10 5.09 0.01 
2 1.25 5.79 5.78 0.01 
3 1.50 6.53 6.56 0.03 
4 1.75 7.45 744 0.01 
5 2.00 8.46 8.44 0.02 
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Figure 8.5 


y= 3 07 1 e0-5056x 


0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 * 


Exponential and other nonlinear discrete least squares approximations can be obtain in 
the Statistics package by using the commands ExponentialFit and NonlinearFit. 

For example, the approximation in the Illustration can be obtained by first defining the 
data with 


X := Vector((1, 1.25, 1.5, 1.75, 2]): Y := Vector((5.1, 5.79, 6.53, 7.45, 8.46]): 
and then issuing the command 

ExponentialFit(X, Y,x) 

gives the result, rounded to 5 decimal places, 


3.07249 e050572x 


If instead the NonlinearFit command is issued, the approximation produced uses methods 
of Chapter 10 for solving a system of nonlinear equations. The approximation that Maple 
gives in this case is 


3.06658(1.66023)* © 3.066586, 


EXERCISE SET 8.1 


1. Compute the linear least squares polynomial for the data of Example 2. 
Compute the least squares polynomial of degree 2 for the data of Example 1, and compare the total 
error E for the two polynomials. 

3. Find the least squares polynomials of degrees 1, 2, and 3 for the data in the following table. Compute 
the error E in each case. Graph the data and the polynomials. 


x; 1.0 1.1 1.3 1.5 1.9 2.1 
y 184 196 2.21 245 2.94 3.18 
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4. Find the least squares polynomials of degrees 1, 2, and 3 for the data in the following table. Compute 
the error F in each case. Graph the data and the polynomials. 


x, O 0.15 0.31 0.5 0.6 0.75 
y, 10 1.004 1.031 1.117 1.223 1.422 


5. Given the data: 


x; 4.0 4.2 4.5 4.7 5.1 5.5 5.9 6.3 6.8 7.1 
y, 102.56 113.18 130.11 142.05 167.53 195.14 224.87 256.73 299.50 326.72 


Construct the least squares polynomial of degree 1, and compute the error. 
Construct the least squares polynomial of degree 2, and compute the error. 
Construct the least squares polynomial of degree 3, and compute the error. 


Construct the least squares approximation of the form be“, and compute the error. 


i os ad 


Construct the least squares approximation of the form bx“, and compute the error. 


6. Repeat Exercise 5 for the following data. 


xi 0.2 0.3 0.6 0.9 1.1 1.3 1.4 1.6 
y; 0.050446 0.098426 0.33277 0.72660 1.0972 1.5697 1.8487 2.5015 


7. Inthe lead example of this chapter, an experiment was described to determine the spring constant k 
in Hooke’s law: 


F(l) =k(l—E). 


The function F is the force required to stretch the spring / units, where the constant E = 5.3 in. is the 
length of the unstretched spring. 


a. Suppose measurements are made of the length /, in inches, for applied weights F'(/), in pounds, 
as given in the following table. 


roa 2 
a 7.0 
4 9.4 


6 12.3 


Find the least squares approximation for k. 


b. Additional measurements are made, giving more data: 


FO. 
3 83 
5 113 
8 14.4 


10 15.9 


Compute the new least squares approximation for k. Which of (a) or (b) best fits the total 
experimental data? 


8. The following list contains homework grades and the final-examination grades for 30 numerical 
analysis students. Find the equation of the least squares line for this data, and use this line to determine 
the homework grade required to predict minimal A (90%) and D (60%) grades on the final. 
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Homework Final Homework Final 


302 45 323 83 
325 72 337 99 
285 54 337 70 
339 54 304 62 
334 79 319 66 
322 65 234 51 
331 99 337 53 
279 63 351 100 
316 65 339 67 
347 99 343 83 
343 83 314 42 
290 74 344 79 
326 76 185 59 
233 57 340 75 
254 45 316 45 


9. The following table lists the college grade-point averages of 20 mathematics and computer science 
majors, together with the scores that these students received on the mathematics portion of the ACT 
(American College Testing Program) test while in high school. Plot these data, and find the equation 
of the least squares line for this data. 


ACT Grade-point ACT  Grade-point 


score average score average 
28 3.84 29 3.75 
25 3.21 28 3.65 
28 3.23 27 3.87 
27 3.63 29 3.15 
28 3.75 21 1.66 
33 3.20 28 3.12 
28 3.41 28 2.96 
29 3.38 26 2.92 
23 3.53 30 3.10 
27 2.03 24 2.81 


10. The following set of data, presented to the Senate Antitrust Subcommittee, shows the comparative 
crash-survivability characteristics of cars in various classes. Find the least squares line that approxi- 
mates these data. (The table shows the percent of accident-involved vehicles in which the most severe 
injury was fatal or serious.) 


Average Percent 
Type Weight = Occurrence 
1. Domestic luxury regular 4800 lb 3.1 
2. Domestic intermediate regular 3700 Ib 4.0 
3. Domestic economy regular 3400 Ib 5.2 
4. Domestic compact 2800 Ib 6.4 
5. Foreign compact 1900 Ib 9.6 


11. To determine a relationship between the number of fish and the number of species of fish in samples 
taken for a portion of the Great Barrier Reef, P. Sale and R. Dybdahl [SD] fit a linear least squares 
polynomial to the following collection of data, which were collected in samples over a 2-year period. 
Let x be the number of fish in the sample and y be the number of species in the sample. 
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Me 
She 
ad 
< 
Me 
< 


13. 11 29 12 60 «14 
15 10 30. «14 62 21 
16 11 31 «16 64 «21 
21 12 36 «17 70 24 
22 12 40 13 72 17 
23 13 42 14 100 23 
25. 13 55. 22 130 34 


Determine the linear least squares polynomial for these data. 
12. To determine a functional relationship between the attenuation coefficient and the thickness of a 
sample of taconite, V. P. Singh [Si] fits a collection of data by using a linear least squares polynomial. 
The following collection of data is taken from a graph in that paper. Find the linear least squares 
polynomial fitting these data. 


Thickness (cm) Attenuation coefficient (dB/cm) 


0.040 26.5 
0.041 28.1 
0.055 25.2 
0.056 26.0 
0.062 24.0 
0.071 25.0 
0.071 26.4 
0.078 27.2 
0.082 25.6 
0.090 25.0 
0.092 26.8 
0.100 24.8 
0.105 27.0 
0.120 25.0 
0.123 27.3 
0.130 26.9 
0.140 26.2 


13. Ina paper dealing with the efficiency of energy utilization of the larvae of the modest sphinx moth 
(Pachysphinx modesta), L. Schroeder [Schr1] used the following data to determine a relation be- 
tween W, the live weight of the larvae in grams, and R, the oxygen consumption of the larvae in 
milliliters/hour. For biological reasons, it is assumed that a relationship in the form of R = bW“ exists 
between W and R. 


a. _ Find the logarithmic linear least squares polynomial by using 
InR=Inb+alnW. 


b. Compute the error associated with the approximation in part (a): 


37 
E =) (Ri — bw;)’. 


i=l 


c. Modify the logarithmic least squares equation in part (a) by adding the quadratic term c(In W,)”, 
and determine the logarithmic quadratic least squares polynomial. 


d. Determine the formula for and compute the error associated with the approximation in part (c). 
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Ww R W R W R W R W R 
0.017 0.154 0.025 0.23 0.020 0.181 0.020 0.180 0.025 0.234 
0.087 0.296 0.111 0.357 0.085 0.260 0.119 0.299 0.233 0.537 
0.174 0.363 0.211 0.366 0.171 0.334 0.210 0.428 0.783 1.47 
1.11 0.531 0.999 0.771 1.29 0.87 1.32 1.15 1.35 2.48 
1.74 2.23 3.02 2.01 3.04 3.59 3.34 2.83 1.69 1.44 
4.09 3.58 4.28 3.28 4.29 3.40 5.48 4.15 2.75 1.84 
5.45 3.52 4.58 2.96 5.30 3.88 4.83 4.66 
5.96 2.40 4.68 5.10 5.53 6.94 


14. Show that the normal equations (8.3) resulting from discrete least squares approximation yield a 
symmetric and nonsingular matrix and hence have a unique solution. [Hint: Let A = (aj), where 


m 


_ i+j—2 
ay = Xz 
k=1 


and x),X2,...,X» are distinct with n < m— 1. Suppose A is singular and that c 4 0 is such that 
c’Ac = 0. Show that the nth-degree polynomial whose coefficients are the coordinates of ¢ has more 


than n roots, and use this to establish a contradiction. ] 


| 8.2 Orthogonal Polynomials and Least Squares Approximation 


The previous section considered the problem of least squares approximation to fit a collec- 
tion of data. The other approximation problem mentioned in the introduction concerns the 
approximation of functions. 

Suppose f € C[a, b] and that a polynomial P,,(x) of degree at most n is required that 


will minimize the error 


b 
i; [ f(x) — Pa(x)]? de. 


To determine a least squares approximating polynomial; that is, a polynomial to mini- 


mize this expression, let 


n 


P(X) = GyX" + dy_x” | +-+-+ayxtay = > ag, 


and define, as shown in Figure 8.6, 


b n 2 
E = E)(do, @,...,n) = (se - Dax) dx. 
a k=0 


The problem is to find real coefficients ao, a,.. 
condition for the numbers do, a1, . 


OE 
0a; 


= 0, 


for each j = 0,1,... 


k=0 


., a, that will minimize E. A necessary 


..,d, to minimize E is that 


yn. 
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Figure 8.6 


Since 
b n b by 2 
z= f [FCP ax 2) a4 f x* f(x) a+ f (doa) dx, 
° k=0 4 a \k=0 


we have 
dE bo n bd 
oo -2 f x! f (x) dv+2 >a f xt tk dy, 
a a k=0 74 


Hence, to find P,,(x), the (n + 1) linear normal equations 
n b , b : 
Ya f xItk dy = } x! f(x) dx, foreachj =0,1,...,n, (8.6) 
k=0 a a 


must be solved for the (n + 1) unknowns a;. The normal equations always have a unique 
solution provided that f € C[a, b]. (See Exercise 15.) 


Example 1 Find the least squares approximating polynomial of degree 2 for the function f(x) = sin x 
on the interval [0, 1]. 


Solution The normal equations for P(x) = ax” + a,x + ap are 


1 1 1 1 
ao | Ldv+ay f xdx+a, f eax | sin 1x dx, 
0 0 0 0 
1 1 1 1 
ao | xdx+ar f 2 dear [ dx= | x sin wx dx, 
0 0 0 0 


1 1 1 1 
ao f a ax+a f x dx + an | x ax = | x? sin wx dx. 
0 0 0 0 


Performing the integration yields 


pty pt 8 Wit Pd 
a =-, a =-, a = 
ieee a ee ea eee ee a 
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These three equations in three unknowns can be solved to obtain 


12x* — 120 720 — 607° 
ay = ——— © —0.050465 and a; = —ay = ——,—— * 4.12251. 
be 1 
Consequently, the least squares polynomial approximation of degree 2 for f(x) = sinax 
on [0, 1] is P2(x) = —4.12251x? + 4.12251x — 0.050465. (See Figure 8.7.) a 
Figure 8.7 
y = sin 2x 
Example | illustrates a difficulty in obtaining a least squares polynomial approximation. 
An (n+ 1) x (nm + 1) linear system for the unknowns ao,...,d@, must be solved, and the 
coefficients in the linear system are of the form 
ee be pitk+l _ githt+1 
David Hilbert (1862-1943) was / gee Fes 
the dominant mathematician at a J +k+1 , 


the turn of the 20th century. He is 


aes a linear system that does not have an easily computed numerical solution. The matrix in the 
best remembered for giving a talk 


linear system is known as a Hilbert matrix, which is a classic example for demonstrating 


at the International Congress of . 5 
round-off error difficulties. (See Exercise 11 of Section 7.5.) 


Mathematicians in Paris in 1900 


in which he posed 23 problems Another disadvantage is similar to the situation that occurred when the Lagrange poly- 
that he thought would be nomials were first introduced in Section 3.1. The calculations that were performed in ob- 
important for mathematicians in taining the best nth-degree polynomial, P,,(x), do not lessen the amount of work required 
the next century. to obtain P,,,; (x), the polynomial of next higher degree. 


Linearly Independent Functions 


A different technique to obtain least squares approximations will now be considered. This 
turns out to be computationally efficient, and once P,,(x) is known, it is easy to determine 
P,+1(x). To facilitate the discussion, we need some new concepts. 


Definition 8.1 The set of functions {@o, ... ,@,} is said to be linearly independent on [a, b] if, whenever 


cogo(x) + cid (xX) +--+ + Cndn(x) = 0, for all x € [a, D], 


we have cp = c) = --: = C, = O. Otherwise the set of functions is said to be linearly 
dependent. | 
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Theorem 8.2 Suppose that, for each j = 0, 1,...,n, d(x) is a polynomial of degree j. Then {do,..., bn} 
is linearly independent on any interval [a, b]. a 


Proof Letco,...,C, be real numbers for which 
P(x) = cofo(x) + c1bi(®) +--+ + Cnbn(x) = 0, forall x € [a, 5]. 


The polynomial P(x) vanishes on [a, b], so it must be the zero polynomial, and the coeffi- 
cients of all the powers of x are zero. In particular, the coefficient of x” is zero. But c,@, (x) 
is the only term in P(x) that contains x”, so we must have c, = 0. Hence 


n—-1 


P(x) = Yo cjGj(x). 


j=0 
In this representation of P(x), the only term that contains a power of x"~! is Cn_-1¢n—1(X), 
so this term must also be zero and 


n—2 


P(x) = D> cidj(x). 


j=0 
In like manner, the remaining constants Cy_2, Cr_3,..-,C1, C9 are all zero, which implies 
that {¢0, 61, ..-, n} is linearly independent on [a, b]. _ 2 8 
Example 2 Let do(x) = 2,6) (x) = x —3, and @o(x) = x? +2x +7, and Q(x) = ay +.a,x+a2x*. Show 
that there exist constants cg, c), and cz such that O(x) = cogo(x) + c161 (x) + c2G2(X). 


Solution By Theorem 8.2, {¢o, 1, 62} is linearly independent on any interval [a, b]. First 
note that 


1 
l= 


3 
= 590%), X= 1) +3 = O1(%) + 5 Gol), 


and 


1 
x? = $o(x) — 2x —7 = Ga(x) — 2 ie - 54000) =F | 50000 


13 
= G2(x) — 2g1(%) — = Pol). 


Hence 


1 3 13 
Q(x) = ao | 50000 +a, jou + 56000) + az Eo — 2g (x) — 3009 | 


1 3 13 
— (5a + au — $2) bo00) + [ay — 2a2] bi (x) + argo (x). a 


The situation illustrated in Example 2 holds in a much more general setting. Let [,, de- 
note the set of all polynomials of degree at most n. The following result is used extensively 
in many applications of linear algebra. Its proof is considered in Exercise 13. 


Theorem 83 Suppose that {¢9(x), (x), ...,¢n(x)} is a collection of linearly independent polynomials 
in | ],,- Then any polynomial in [ ],, can be written uniquely as a linear combination of #0 (x), 


oi(x), iar on(x). a 
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Definition 8.4 


Figure 8.8 
w(x) 4 


The word orthogonal means 
right-angled. So in a sense, 
orthogonal functions are 
perpendicular to one another. 


Approximation Theory 


Orthogonal Functions 


To discuss general function approximation requires the introduction of the notions of weight 
functions and orthogonality. 


An integrable function w is called a weight function on the interval / if w(x) > 0, for all 
x in I, but w(x) ¥ 0 on any subinterval of J. a 


The purpose of a weight function is to assign varying degrees of importance to approx- 
imations on certain portions of the interval. For example, the weight function 


1 
V1 — x2 
places less emphasis near the center of the interval (—1, 1) and more emphasis when |x| is 
near | (see Figure 8.8). This weight function is used in the next section. 


Suppose {¢0, ¢1,.--, Pn} is a set of linearly independent functions on [a, b] and w is a 
weight function for [a,b]. Given f € Cla, b], we seek a linear combination 


w(x) = 


P(x) = D> ange) 
k=0 
to minimize the error 
b n 2 
E = E(ag,...,4n) = i; we F09 -» aout dx. 
a k=0 


This problem reduces to the situation considered at the beginning of this section in the 
special case when w(x) = | and ¢x(x) = x*, for each k = 0,1,...,7. 

The normal equations associated with this problem are derived from the fact that for 
eachj = 0,1,...,n, 


dE b “ 
g= Fay / we] 09 =) axdnts) [O09 dx. 
J a k=0 


The system of normal equations can be written 
b a b 
/ w(x) f (x)oj(x) dx = Ya f w(x) bk (x)Gj(x) dx, for j=0,1,...,n. 
a k=0 a 


If the functions ¢o, ¢1, ...,@, can be chosen so that 


0, when j #k, 


: (8.7) 
a; >0, when j =k, 


b 
/ W(X) eX) Gj (x) dx = 


then the normal equations will reduce to 
b 


b 
[ wore ae=a | wootecor dr = aay, 
for each j = 0,1,...,n. These are easily solved to give 
1 b 
aj = — i w(x) f (x)bj(x) dx. 
aj a 
Hence the least squares approximation problem is greatly simplified when the functions 


do, 01,---,n are chosen to satisfy the orthogonality condition in Eq. (8.7). The remainder 
of this section is devoted to studying collections of this type. 
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Theorem 8.6 


Theorem 8.7 


Erhard Schmidt (1876-1959) 
received his doctorate under the 
supervision of David Hilbert in 
1905 for a problem involving 
integral equations. Schmidt 
published a paper in 1907 in 
which he gave what is now called 
the Gram-Schmidt process for 
constructing an orthonormal 
basis for a set of functions. This 
generalized results of Jorgen 
Pedersen Gram (1850-1916) who 
considered this problem when 
studying least squares. Laplace, 
however, presented a similar 
process much earlier than either 
Gram or Schmidt. 
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{o, d1,---,@n} is said to be an orthogonal set of functions for the interval [a,b] with 
respect to the weight function w if 
b * 
0, when k, 
i w(x)dx (x) bj(x) de = ha 
a aj >0, when j=k. 
If, in addition, w; = | for each j = 0, 1,...,n, the set is said to be orthonormal. | 


This definition, together with the remarks preceding it, produces the following theorem. 


If {@0,..-,@n} is an orthogonal set of functions on an interval [a,b] with respect to the 
weight function w, then the least squares approximation to f on [a,b] with respect to w is 


P(x) =) adj(a), 
j=0 
where, for each j = 0,1,...,1, 


Aes ris w(x)bj (x) f (x) dx _ 1 
PP woolgoPde — % 


b 
/ w(x)dj(x) f (x) dx. a 


Although Definition 8.5 and Theorem 8.6 allow for broad classes of orthogonal func- 
tions, we will consider only orthogonal sets of polynomials. The next theorem, which is 
based on the Gram-Schmidt process, describes how to construct orthogonal polynomials 
on [a, b] with respect to a weight function w. 


The set of polynomial functions {¢, $1, ... ,@,} defined in the following way is orthogonal 
on [a, b] with respect to the weight function w. 


do(x)=1, O(%)=x-—B, foreach x in [a, 5], 


where 


_ SP xweleooP ax 
J wOxldoxP de” 


and when k > 2, 
x(x) = (x — Bg)Ge—1 (x) — Crbg—2(x), for each x in [a, ], 
where 


= f? xw@ldr 100? dx 
f? w@lor1@P ax 


and 


L? xw@)de-1Oge-20) dx 
L? wOoldraQ@OP dx 


Theorem 8.7 provides a recursive procedure for constructing a set of orthogonal polyno- 
mials. The proof of this theorem follows by applying mathematical induction to the degree 
of the polynomial ¢, (x). 
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Corollary 8.8 


Illustration 


Approximation Theory 


For any n > 0, the set of polynomial functions {¢o, . . . ,@,} given in Theorem 8.7 is linearly 
independent on [a, b] and 


b 
i W(X) on(X)Ox(x) dx = 0, 


for any polynomial Q;(x) of degree k < n. a 


Proof Foreachk = 0,1,...,n, d(x) is a polynomial of degree k. So Theorem 8.2 implies 
that {@o,...,@,} is a linearly independent set. 

Let Q;(x) be a polynomial of degree k < n. By Theorem 8.3 there exist numbers 
Co,---,Cx such that 


k 


Ox(x) = D> cj@j(x). 


j=0 
Because ¢, is orthogonal to ¢; for each j = 0,1,...,k we have 


k 


b k b 
[ wearcen de= 6 f werd erent) dr = P60 =0. 
a j=0 a z 


j=0 


The set of Legendre polynomials, {P,,(x)}, is orthogonal on [—1, 1] with respect to the 
weight function w(x) = 1. The classical definition of the Legendre polynomials requires 
that P,(1) = 1 for each n, and a recursive relation is used to generate the polynomials 
when n > 2. This normalization will not be needed in our discussion, and the least squares 
approximating polynomials generated in either case are essentially the same. 


Using the Gram-Schmidt process with Po(x) = | gives 


[eae 
B= =0 and P\(x) = (« — Bi) Po(x) = x. 
_, ax 
Also, 
1 3 a) 
x? dx x dx 1 
py = _ 4 and Oe age 
pad fila 3 
so 
1 1 
P(x) = & — By) Pi (x) — CoPo@) = & — 0)x — ra l=x’- = 


The higher-degree Legendre polynomials shown in Figure 8.9 are derived in the same 
manner. Although the integration can be tedious, it is not difficult with a Computer Algebra 
System. 
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Figure 8.9 


For example, the Maple command int is used to compute the integrals B; and C3: 


BB: int (x (x° _ ee = -1..1) _ int (x (ae a 7) x= —1..1) 
- int ((x? _ 5 ‘ x= 1) ; ~ int(x2,x = —1..1) 
0 
4 
15 
Thus 


4 3 1 4 3 3 
P3(x) = xP2(x) — q5Pi@) == ae _ 5" =x - 5° 


The next two Legendre polynomials are 


6 3 10 5 
Pax) = x* - ae +32 and Ps(x) = = ae + =x, 


The Legendre polynomials were introduced in Section 4.7, where their roots, given on 
page 232, were used as the nodes in Gaussian quadrature. 
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EXERCISE SET 82 


1. Find the linear least squares polynomial approximation to f(x) on the indicated interval if 


a f(x) =x? +3x+2, [0,1]; b. f(x) =x, [0,2]; 
« fM= = [1, 3]; dad f@=e, [0,2]; 
x 
1 1 
e f(x) = 5 cosxt 3 sin2x, [0, 1]; f. f(x) =xInx, [1,3]. 
2. Find the linear least squares polynomial approximation on the interval [—1, 1] for the following 
functions. 
a f(x) =x? —2x4+3 bh f@=x 
1 
— da. fw=e 
Cc. 
bo aaa om 
1 1 
e f(x)= 5 cosx + ; sin 2x f. f(x) =In@+2) 
3. Find the least squares polynomial approximation of degree two to the functions and intervals in 
Exercise 1. 


4. Find the least squares polynomial approximation of degree 2 on the interval [—1, 1] for the functions 
in Exercise 3. 


in 


Compute the error E for the approximations in Exercise 3. 
6. Compute the error E for the approximations in Exercise 4. 


x 


Use the Gram-Schmidt process to construct ¢o (x), @1 (x), b2(x), and 3(x) for the following intervals. 
a. [0,1] b. [0,2] ce [1,3] 
8. Repeat Exercise | using the results of Exercise 7. 
9. Obtain the least squares approximation polynomial of degree 3 for the functions in Exercise | using 
the results of Exercise 7. 
10. Repeat Exercise 3 using the results of Exercise 7. 
11. Use the Gram-Schmidt procedure to calculate L,, L,, and L3, where {Lo (x), Li (x), Lo(x), L3(x)} is 
an orthogonal set of polynomials on (0,00) with respect to the weight functions w(x) = e* and 
Lo(x) = 1. The polynomials obtained from this procedure are called the Laguerre polynomials. 
12. Use the Laguerre polynomials calculated in Exercise 11 to compute the least squares polynomials of 
degree one, two, and three on the interval (0, oo) with respect to the weight function w(x) = e~™* for 
the following functions: 


a f(x) =x b. f(x) =e « fx=x da. fw=e* 
13. Suppose {¢o, #1, ...,¢n} is any linearly independent set in [],,. Show that for any element Q € JJ,,, 
there exist unique constants Co, C),..., Cy, Such that 


n 


O(x) = D> cre (@). 


k=0 
14. Show that if {¢@o, ¢1,...,@n} is an orthogonal set of functions on [a,b] with respect to the weight 
function w, then {¢0, ¢;,..., ,} is a linearly independent set. 


15. Show that the normal equations (8.6) have a unique solution. [Hint: Show that the only solution for the 
function f (x) = 0is a; = 0,j = 0,1,...,. Multiply Eq. (8.6) by a;, and sum over all j. Interchange 
the integral sign and the summation sign to obtain [)(P@Pdx = 0. Thus, P(x) = 0, so a; = 0, for 
j =0,...,n. Hence, the coefficient matrix is nonsingular, and there is a unique solution to Eq. (8.6).] 


| Si 8.3 Chebyshev Polynomials and Economization of Power Series 


The Chebyshev polynomials {7,,(x)} are orthogonal on (—1, 1) with respect to the weight 
function w(x) = (1 —x’)~'/? . Although they can be derived by the method in the previous 
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Pafnuty Lvovich Chebyshev 
(1821-1894) did exceptional 
mathematical work in many 
areas, including applied 
mathematics, number theory, 
approximation theory, and 
probability. In 1852 he traveled 
from St. Petersburg to visit 
mathematicians in France, 
England, and Germany. Lagrange 
and Legendre had studied 
individual sets of orthogonal 
polynomials, but Chebyshev was 
the first to see the important 
consequences of studying the 
theory in general. He developed 
the Chebyshev polynomials to 
study least squares 
approximation and probability, 
then applied his results to 
interpolation, approximate 
quadrature, and other areas. 
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section, it is easier to give their definition and then show that they satisfy the required 
orthogonality properties. 
For x € [—1, 1], define 


T, (x) = cos[narccosx], foreachn > 0. (8.8) 


It might not be obvious from this definition that for each n, T,,(x) is a polynomial in x, but 
we will now show this. First note that 


To(x) =cosO=1 and 7\(x) = cos(arccos x) = x. 
For n > 1, we introduce the substitution 6 = arccos x to change this equation to 
T, (6(x)) = T,(0) = cos(n@), where 6 € [0, zr]. 
A recurrence relation is derived by noting that 


T,41(0) = cos(n + 1)0 = cos@ cos(n@) — sind sin(né) 
and 


Tn-1(0) = cos(n — 1)9 = cos@ cos(n8) + sin@ sin(n@) 
Adding these equations gives 
Tn41(9) = 2.cos@ cos(nd) — T,_1(@). 
Returning to the variable x = cos 0, we have, forn > 1, 
Tn+1(x) = 2x cos(narccos x) — T,-1(x), 
that is, 
Ty41(%) = 2xTn (x) — Tr-1(*). (8.9) 


Because 7To(x) = 1 and 7;(x) = x, the recurrence relation implies that the next three 
Chebyshev polynomials are 


T(x) = 2xT;(x) — To(x) = 2x? — 1, 
T3(x) = 2xTo(x) — Ti (x) = 4x° — 3x, 


and 


T4(x) = 2xT3(x) — To(x) = 8x4 — 8x7 +1. 


The recurrence relation also implies that when n > 1, T,,(x) is a polynomial of degree n 
with leading coefficient 2"-!_ The graphs of 7), T2, T3, and T, are shown in Figure 8.10. 
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Figure 8.10 


To show the orthogonality of the Chebyshev polynomials with respect to the weight 


function w(x) = (1 — x*)~!/?, consider 
! T,(x)Tn (x) i: ' cos(n arccos x) cos(m arccos x) d 
— dx = IX. 
-1 V1l—x? “4 VJ 1 — x2 


Reintroducing the substitution 9 = arccos x gives 
dé : d. 
= ————_ dk 
V1—x? 
and 
' Ty) Tn (2) i 
——— dx = — cos(n@) cos(m@) dd = / cos(n@) cos(mé@) dé. 
-1 WwW _ x? 1 0 


Suppose n 4 m. Since 


cos(n@) cos(m@) = sleostn +m)@ + cos(n — m)6@], 


we have 
ne Sf coscn + me) a0 + 5 J coscin— myo) ad 
x= cos((n +m = cos((n — m 
- V1l—x 2 Jo 2 Jo 
1 1 7 
= | ———— si 06) + ————- Si — mjd =0. 
E- a) sin((m + m)@) + =m sin((n — m) | 
By a similar technique (see Exercise 9), we also have 
1 T, 2 
UG, oe”. erences: (8.10) 


-1V1—x? 2 
The Chebyshev polynomials are used to minimize approximation error. We will see 
how they are used to solve two problems of this type: 
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© an optimal placing of interpolating points to minimize the error in Lagrange interpolation; 


© a means of reducing the degree of an approximating polynomial with minimal loss of 
accuracy. 


The next result concerns the zeros and extreme points of T,,(x). 


Theorem 8.9 The Chebyshev polynomial T,,(x) of degree n > 1 has n simple zeros in [—1, 1] at 


2 2k—1 
Xp = COS x), foreachk = 1,2,...,n. 
2n 


Moreover, 7,,(x) assumes its absolute extrema at 


kr 
X, = Cos (=) with T,(x,) =(—1)*, foreach k=0,1,...,n. rT] 
n 
Proof Let 
- 2k — 1 
Xp = COS x), for k=1,2,...,n. 
2n 
Then 


= _ 2k —1 2k —1 
T,, (X,) = cos(narccos x,) = cos(1 arccos(cos( 5 ))) = cos( 5 n) = 0. 
n 


But the x, are distinct (see Exercise 10) and T,,(x) is a polynomial of degree n, so all the 
zeros of T,,(x) must have this form. 
To show the second statement, first note that 


nsin(n arccos x) 


J1I—-x ° 


LQ= Bae arccos x)] = 
dx 


and that, when k = 1,2,...,n—1, 


ka 
nsin (narccos (cos (=))) nsin(k) _ 


T, Gj) = = =0 
(fy 8G) 
— | cos {| — n 
n 


Since T,,(x) is a polynomial of degree n, its derivative T/ (x) is a polynomial of degree 
(n — 1), and all the zeros of T/ (x) occur at these n — | distinct points (that they are distinct 
is considered in Exercise 11). The only other possibilities for extrema of T,,(x) occur at the 
endpoints of the interval [—1, 1]; that is, at xj = 1 and at x), = —1. 

For any k = 0,1,..., we have 


Tail Ged Ge) | 
T,(X;,) = cos | narccos | cos a = cos(kir) = (—1)*. 


So a maximum occurs at each even value of k and a minimum at each odd value. =» = =o 


The monic (polynomials with leading coefficient 1) Chebyshev polynomials T(x) are 
derived from the Chebyshev polynomials 7,,(x) by dividing by the leading coefficient 2”~!. 
Hence 


7 “ 1 
To(x)=1 and T,(x)= Fai En), for eachn > 1. (8.11) 
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Figure 8.11 


Theorem 8.10 


Approximation Theory 


The recurrence relationship satisfied by the Chebyshev polynomials implies that 
as é ie 
To(x) = xT, (x) — 3 Fo) and (8.12) 
es 3S Le 
Thi (x) = xT, (x) — Z n-1(X), foreachn > 2. 


The graphs of t. To. Ts. ioe and Ts are shown in Figure 8.11. 


Because T, (x) is just a multiple of T,,(x), Theorem 8.9 implies that the zeros of T., (x) 
also occur at 


S 2k — 1 
Xk = COS x), foreachk = 1,2,...,n, 
2n 


and the extreme values of Tn (x), for n > 1, occur at 


(—1* 


“rar foreachk=0,1,2,...,.n. (8.13) 


k 7 
x, = cos (=), with 7,(%,) = 
n 

Let Tle denote the set of all monic polynomials of degree n. The relation expressed 
in Eq. (8.13) leads to an important minimization property that distinguishes T, (x) from the 
other members of [,,. 


The polynomials of the form T,,(x), when n > 1, have the property that 


pat = MAX (Tnx) S_max, |Pa(x)|, forall Pu(x) € II, 


Moreover, equality occurs only if P,, = jm | 
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Proof Suppose that P,(x) € i: and that 


max |P,(x)| < 
1,1] 


= max |T,(x). 
xe[— ] 


Qn-1 xef-1,1 


Let Q = ‘a — P,,. Then i. (x) and P,,(x) are both monic polynomials of degree n, so Q(x) is 
a polynomial of degree at most (n — 1). Moreover, at the n + 1 extreme points x, of T,(x), 


we have 
, = 8 _,,__ (—1)* : 
O%) = Tr) — Pa) = “ont Pr (Xx). 
However 
PrQ l= = for each k = 0,1,...,n, 
so we have 


Q(x) <0, whenkisodd and Q(x,)>0, whenk is even. 


Since Q is continuous, the Intermediate Value Theorem implies that for each j = 
0,1,...,2 — 1 the polynomial Q(x) has at least one zero between x; and x; 41 Thus, 
Q has at least n zeros in the interval [—1, 1]. But the degree of Q(x) is less than n, so Q = 0. 
This implies that P, = Ty. oe 


Minimizing Lagrange Interpolation Error 


Theorem 8.10 can be used to answer the question of where to place interpolating nodes 
to minimize the error in Lagrange interpolation. Theorem 3.3 on page 112 applied to the 
interval [—1, 1] states that, if xo,...,x, are distinct numbers in the interval [—1, 1] and if 
f €c"t'[-1, 1], then, for each x € [—1, 1], a number &(x) exists in (—1, 1) with 
(n+1) 
Fea) = Peay = EO Ge — saya) + 

where P(x) is the Lagrange interpolating polynomial. Generally, there is no control over 
&(x), so to minimize the error by shrewd placement of the nodes xo,...,%,, we choose 
X0,--++,X, to minimize the quantity 


|(x — Xo) — x1) +++ (& — Xp) 


throughout the interval [—1, 1]. 
Since (x — xo)(* — x1) +--+ (% — x) is a monic polynomial of degree (n + 1), we have 
just seen that the minimum is obtained when 


(x — xo) (x — x1) +++ (% — Xn) = Tri (2). 


The maximum value of |(x — x9) (x — x) -- « (x — x,)| is smallest when x; is chosen for 
each k = 0,1,..., to be the (k + 1)st zero of T,,,,;. Hence we choose x; to be 


2k +1 ) 


Xk+1 = COS (a 


Because maX;e¢{—-1,1] ane 1(x)| = 2~”, this also implies that 


=> = max |(x—%1)-+-(&—Xny1)| < max |(x—2x9)---@— xX), 
2” xe[=1,1] xe[—1,1] 


for any choice of xo,x1,...,X, in the interval [—1, 1]. The next corollary follows from these 
observations. 
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Corollary 8.11 Suppose that P(x) is the interpolating polynomial of degree at most n with nodes at the 
zeros of 7,41 (x). Then 


1 
—PpP < (n+1) fi h n+l 1.11. 
nes |) (x)| < CMe el (x)|, foreach fec*™[-1,1]. m 


Minimizing Approximation Error on Arbitrary Intervals 


The technique for choosing points to minimize the interpolating error is extended to a 
general closed interval [a, b] by using the change of variables 


x= sb ax tab] 


to transform the numbers x; in the interval [—1, 1] into the corresponding number x; in the 
interval [a, b], as shown in the next example. 


Example 1 Let f(x) = xe* on [0, 1.5]. Compare the values given by the Lagrange polynomial with 
four equally-spaced nodes with those given by the Lagrange polynomial with nodes given 
by zeros of the fourth Chebyshev polynomial. 


Solution The equally-spaced nodes x9 = 0,x; = 0.5, x2 = 1, and x3 = 1.5 give 
Lo(x) = —1.3333x? + 4.0000x” — 3.6667x + 1, 
L(x) = 4.0000x* — 10.000x? + 6.0000x, 
Lo(x) = —4.0000x? + 8.0000x” — 3.0000x, 
L3(x) = 1.3333x* — 2.000x” + 0.66667x, 

which produces the polynomial 

P3(x) = Lo(x)(0) + Ly (x) (0.5e">) + Lo(x)e! + L3(x)(1.5e!>) = 1.3875x° 
+ 0.057570x? + 1.2730x. 


For the second interpolating polynomial, we shift the zeros x, = cos((2k + 1)/8)z, 
for k = 0,1, 2,3, of T from [—1, 1] to [0, 1.5], using the linear transformation 


1 
Xe = 5 [(1.5 — O)x, + (1.5 + 0)] = 0.75 + 0.75 xz. 
Because 


7 Tv _ 37 
Xo = cos rag 0.92388, x, = cos 7 a 0.38268, 


5 7 
X2 = COS = = —0.38268, andx4 = cos > = —0.92388, 
we have 
Xo = 1.44291, x, = 1.03701, x. =0.46299, and x3 = 0.05709. 


The Lagrange coefficient polynomials for this set of nodes are 


Lo(x) = 1.8142x7 — 2.8249x? + 1.0264x — 0.049728, 
L(x) = —4.3799x? + 8.5977x? — 3.4026x + 0.16705, 
Ly(x) = 4.3799x3 — 11.112x? + 7.1738x — 0.37415, 
L3(x) = —1.8142x3 + 5.3390x? — 4.7976x + 1.2568. 
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The functional values required for these polynomials are given in the last two columns 
of Table 8.7. The interpolation polynomial of degree at most 3 is 


P3(x) = 1.3811x° + 0.044652x? + 1.3031x — 0.014352. 


Table 8.7 x fx) = xe" x f@) = xe* 
Xo = 0.0 0.00000 Xo = 1.44291 6.10783 
x; = 0.5 0.824361 x; = 1.03701 2.92517 
xX = 1.0 2.71828 X_ = 0.46299 0.73560 
x%3= 1.5 6.72253 x3 = 0.05709 0.060444 
For comparison, Table 8.8 lists various values of x, together with the values of 
F(x), P3), and P3 (x). It can be seen from this table that, although the error using P3(x) is 
less than using P3(x) near the middle of the table, the maximum error involved with using 
P3(x), 0.0180, is considerably less than when using P3(x), which gives the error 0.0290. 
(See Figure 8.12.) | 
rae f)=xeo PQ) xe" Pa)! a(x) — P(X) 
0.15 0.1743 0.1969 0.0226 0.1868 0.0125 
0.25 0.3210 0.3435 0.0225 0.3358 0.0148 
0.35 0.4967 0.5121 0.0154 0.5064 0.0097 
0.65 1.245 1.233 0.012 1.231 0.014 
0.75 1.588 1.572 0.016 1.571 0.017 
0.85 1.989 1.976 0.013 1.974 0.015 
1.15 3.632 3.650 0.018 3.644 0.012 
1.25 4.363 4.391 0.028 4.382 0.019 
1.35 5.208 5237 0.029 5.224 0.016 
Figure 8.12 
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Reducing the Degree of Approximating Polynomials 


Chebyshev polynomials can also be used to reduce the degree of an approximating poly- 

nomial with a minimal loss of accuracy. Because the Chebyshev polynomials have a mini- 

mum maximum-absolute value that is spread uniformly on an interval, they can be used to 

reduce the degree of an approximation polynomial without exceeding the error tolerance. 
Consider approximating an arbitrary nth-degree polynomial 


Pr(x) = Ayx" + ix +:+++a,x + a9 


on [—1, 1] with a polynomial of degree at most n — 1. The object is to choose P,_;(x) in 
TI,_1 So that 
max |Pn(x) ~ Pr-1@)| 
xe[-1, 1] 
is as small as possible. 
We first note that (P(x) — Pn—1(x))/an is a monic polynomial of degree n, so applying 
Theorem 8.10 gives 


1 
max |—(P,,(x) — Pp-1(@))| = : 
xe[—1,1] | a, Qr-l 


Equality occurs precisely when 
1 . 
G, Pn) = P,-1(x)) = T,,(x). 
This means that we should choose 


Pp—10) = Pa(x) — GnTa(X), 


and with this choice we have the minimum value of 


|an| 


Qn-1 : 


1 
max |Py(x) — Pr—1()| = |dn| max |—(Pp(x) — Pa-1())| = 
] xe{-1, 1] | dp 


xe[—1,1 


Illustration The function f(x) = e* is approximated on the interval [—1, 1] by the fourth Maclaurin 


polynomial 
a ae: 
P. =1 —+—+-, 
4(x) Fe og 


which has truncation error 


IFO EGP] _ _e 


R = 
Ra (| 120 = 720 


~ 0.023, for —-l<x<1l. 


Suppose that an error of 0.05 is tolerable and that we would like to reduce the degree of the 
approximating polynomial while staying within this bound. 


The polynomial of degree 3 or less that best uniformly approximates P4(x) on [—1, 1] is 


BO ne ane siae eS eee 
i i ke aaa, Seales aa” ae) 


_ 191 Ba ls 
= (95. ga Pe 
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With this choice, we have 


= 1 1 1 
P4(x) — P = |aaT. <—-—= = — < 0.0053. 
|Pa(x) — P3(x)| = lagT4(x)| < ”° 3 190 = 


Adding this error bound to the bound for the Maclaurin truncation error gives 
0.023 + 0.0053 = 0.0283, 


which is within the permissible error of 0.05. 


The polynomial of degree 2 or less that best uniformly approximates P3(x) on [—1, 1] is 


1- 
P2(x) = P3(x) — 63M) 


19 , Bey lp Lis 3) 1 9, Bo 
= — = — = Xx =, ae 
(i oO Ss GCA ee 
However, 

|P3(x) — P2(x)| = Se me IV’ | oo 

oe REO eee rel ey. Oks ges 


which—when added to the already accumulated error bound of 0.0283—exceeds the tol- 
erance of 0.05. Consequently, the polynomial of least degree that best approximates e* on 
{[—1, 1] with an error bound of less than 0.05 is 


re ee Phe o 3 
ore ioe gg Ge 
Table 8.9 lists the function and the approximating polynomials at various points in [—1, 1]. 
Note that the tabulated entries for P2 are well within the tolerance of 0.05, even though the 
error bound for P(x) exceeded the tolerance. 


x e P4(x) P3(x) Po(x) le’ — Po)| 
—0.75 0.47237 0.47412 0.47917 0.45573 0.01664 
—0.25 0.77880 0.77881 0.77604 0.74740 0.03140 

0.00 1.00000 1.00000 0.99479 0.99479 0.00521 

0.25 1.28403 1.28402 1.28125 1.30990 0.02587 

0.75 2.11700 2.11475 2.11979 2.14323 0.02623 


EXERCISE SET 83 


Use the zeros of T; to construct an interpolating polynomial of degree 2 for the following functions 
on the interval [—1, 1]. 


a fw=e b. f(x) =sinx ce f(x) =In@+ 2) d f@m= xt 
Use the zeros of 7, to construct an interpolating polynomial of degree 3 for the functions in Exercise 1. 
Find a bound for the maximum error of the approximation in Exercise 1 on the interval [—1, 1]. 


Repeat Exercise 3 for the approximations computed in Exercise 3. 
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5. Use the zeros of 7; and transformations of the given interval to construct an interpolating polynomial 
of degree 2 for the following functions. 


a fo=s, 13 b f@=er, [0,2] 


> 


«e fxa= 


6. Find the sixth Maclaurin polynomial for xe‘, and use Chebyshev economization to obtain a lesser- 
degree polynomial approximation while keeping the error less than 0.01 on [—1, 1]. 


Nl & 


1 
cos.x + 5 sin 2x, [0, 1] d. f@)=xlInx, [1,3] 


7. Find the sixth Maclaurin polynomial for sin x, and use Chebyshev economization to obtain a lesser- 
degree polynomial approximation while keeping the error less than 0.01 on [—1, 1]. 


* 


Show that for any positive integers i and j with i > j, we have T;(x)T;(x) = S(T i4j (x) + T;_-j;@)]. 
9. Show that for each Chebyshev polynomial T,,(x), we have 


[Gor 
Se NEN pos 
1 VJ1—-x2 


10. Show that for each n, the Chebyshev polynomial T,,(x) has n distinct zeros in (—1, 1). 


a 
2: 


11. Show that for each n, the derivative of the Chebyshev polynomial T,,(x) has n — 1 distinct zeros 
in (—1, 1). 


| 8.4 Rational Function Approximation 


The class of algebraic polynomials has some distinct advantages for use in approximation: 


e There are a sufficient number of polynomials to approximate any continuous function on 
a closed interval to within an arbitrary tolerance; 


© Polynomials are easily evaluated at arbitrary values; and 


e The derivatives and integrals of polynomials exist and are easily determined. 


The disadvantage of using polynomials for approximation is their tendency for oscil- 
lation. This often causes error bounds in polynomial approximation to significantly exceed 
the average approximation error, because error bounds are determined by the maximum 
approximation error. We now consider methods that spread the approximation error more 
evenly over the approximation interval. These techniques involve rational functions. 

A rational function r of degree N has the form 


_ pes) 


— q(x)’ 


where p(x) and q(x) are polynomials whose degrees sum to NV. 

Every polynomial is a rational function (simply let gx) = 1), so approximation by 
rational functions gives results that are no worse than approximation by polynomials. How- 
ever, rational functions whose numerator and denominator have the same or nearly the same 
degree often produce approximation results superior to polynomial methods for the same 
amount of computation effort. (This statement is based on the assumption that the amount 
of computation effort required for division is approximately the same as for multiplication.) 

Rational functions have the added advantage of permitting efficient approximation 
of functions with infinite discontinuities near, but outside, the interval of approximation. 
Polynomial approximation is generally unacceptable in this situation. 
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Henri Padé (1863-1953) gave a 
systematic study of what we call 
today Padé approximations in his 
doctoral thesis in 1892. He 
proved results on their general 
structure and also clearly set out 
the connection between Padé 
approximations and continued 
fractions. These ideas, however, 
had been studied by Daniel 
Bernoulli (1700-1782) and 
others as early as 1730. James 
Stirling (1692-1770) gave a 
similar method in Methodus 
differentialis published in the 
same year, and Euler used 
Padé-type approximation to find 
the sum of a series. 


Example 1 
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Padé Approximation 
Suppose r is a rational function of degree N = n + m of the form 


— PR) Pot Pix++ ++ + Dax" 


r(x) — > 
q(x) go tqix +--+ + qmx™ 


that is used to approximate a function f on a closed interval J containing zero. For r to be 
defined at zero requires that go # 0. In fact, we can assume that qo = 1, for if this is not 
the case we simply replace p(x) by p(x)/qo and q(x) by q(x)/qo. Consequently, there are 
N + 1 parameters 9), 92,.--,4m>P0»P1>+--»Pn available for the approximation of f by r. 

The Padé approximation technique, is the extension of Taylor polynomial approxi- 
mation to rational functions. It chooses the N + 1 parameters so that f(0) = r (0), for 
each k = 0,1,...,N. When n = N and m = 0, the Padé approximation is simply the Nth 
Maclaurin polynomial. 

Consider the difference 


Ps) _— f@)q@) =p) _ f@) Vireo gx! — Vio pix! 


OP PCa ae =. ae q(x) 


5) 


and suppose f has the Maclaurin series expansion f(x) = )\;<, a;x'. Then 


eo ax! yi-0 qix' _ Yieo pa 


f() — rx) = a 


(8.14) 


The object is to choose the constants gi, q2,.--,Gm and po, P1,--->Pn SO that 


f®0) —r™() =0, foreachk = 0,1,...,N. 


In Section 2.4 (see, in particular, Exercise 10 on page 86) we found that this is equivalent 
to f — r having a zero of multiplicity N + 1 at x = 0. As a consequence, we choose 
15 92>+++>dm and po, Pi,---,Pn SO that the numerator on the right side of Eq. (8.14), 


(ago taux t-+-)A+qixt-+++ mx") — (po + pix t-++ + pnx"), (8.15) 


has no terms of degree less than or equal to N. 

To simplify notation, we define p,+) = Pn42 = °°: = Py = O and Gnii = Gnt2 = 
-- + = gy = 0. Wecan then express the coefficient of x* in expression (8.15) more compactly 
as 


k 
( x aus) — Pk- 
i=0 


The rational function for Padé approximation results from the solution of the N + 1 linear 
equations 


k 


So aid i =pp,, k=0,1,...,N 
i=0 


in the N + 1 unknowns q1, q2,-.--,9m>P0.P1;+++>Pn- 


The Maclaurin series expansion for e~~* is 
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Table 8.10 


Approximation Theory 
Find the Padé approximation to e~* of degree 5 with n = 3 and m = 2. 
Solution To find the Padé approximation we need to choose po, P1, P2,P3, 41, and q2 so that 
the coefficients of x* fork = 0,1,...,5 are 0 in the expression 
oe 
(1 Sg +Ja + qx + Gx’) — (Po + pix t+ pox? + p3x°). 
Expanding and collecting terms produces 
5 1 é 1 1 0 7 1 i 
we: : x =- = Pr; 
120 941! 62 2 qi + 42 = P2 
‘ eo ie a : 1+ 
as — : Koy = =P; 
74 6!! 522 N Pi 
i 1 2 1 é \ 
5 ae —= =qi - = p33 Xs = Po. 
5 541 q2 = P3 Po 
To solve the system in Maple, we use the following commands: 
eqi:=—-l1+ql=pl: 
eq2:= + —ql+q2=p2: 
eq 3 := -i+ ql — q2 = p3: 
eq4:= x a tgl + $q2 = 0: 
eqs := -a + xq a 2q2 = 0: 
solve({eq1, eq2, eg3, eg4, eq5}, {q1, q2, p1, p2, p3}) 
This gives 
93 as _ il _ 2 =. 
P= go P= a oe es 
So the Padé approximation is 
3 3.59 143 
iS 1— gx + 35x — 60% 
1+ 2x4 px? 
Table 8.10 lists values of r(x) and Ps (x), the fifth Maclaurin polynomial. The Padé approx- 
imation is clearly superior in this example. a 
x e* Ps(x) le* — Ps(@)| r(x) le*—r(x)| 
0.2 0.81873075 0.81873067 8.64 x 1078 0.81873075 7.55 x 10~° 
0.4 0.67032005 0.6703 1467 5.38 x 10-6 0.6703 1963 4.11 x 10-7 
0.6 0.54881164 0.54875200 5.96 x 10-5 0.54880763 4.00 x 107° 
0.8 0.44932896 0.44900267 3.26 x 10-4 0.44930966 1.93 x 10-5 
1.0 0.36787944 0.36666667 1.21 x 10-3 0.3678 1609 6.33 x 10-> 


Maple can also be used directly to compute a Padé approximation. We first compute 
the Maclaurin series with the call 


series(exp(—x), x) 
to obtain 


i il 1 1 
t=% 2 3 x 5 O 6 
ea 6. 7 oa" 120" cee 


The Padé approximation r(x) with n = 3 and m = 2 is found using the command 
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r := x — convert(%, ratpoly, 3,2); 
where the % refers to the result of the preceding calculation, namely, the series. The Maple 
result is 
3 Le ee, 
1- 5x + 70~ = 60 


iL 
2 J ¥2 
1+ gx + 55% 


We can then compute, for example, r(0.8) by entering 
r(0.8) 


which produces the approximation 0.4493096647 to e~°® = 0.449328964. 
Algorithm 8.1 implements the Padé approximation technique. 


Padé Rational Approximation 
To obtain the rational approximation 


— PO) _ Dizo PX! 
= 1G) Sg 


for a given function f (x): 


INPUT nonnegative integers m and n. 

OUTPUT coefficients go, q1,..-,Gm and Po, P1,+++sPn- 

Step 7 SetN=m-+n. 

f°O) 

(The coefficients of the ern polynomial are ao, ...,ay, which could be 
input instead of calculated.) 


Step 2 Fori=0,1,...,N seta; = 


Step 3 Setgqo =1; 
Po = 4. 


Step 4 Fori=1,2,...,N do Steps 5-10. (Set up a linear system with matrix B.) 


Step 5 Forj=1,2,...,i-—1 
ifj < nthen set bj; = 0. 


Step 6 Ifi<nthenset bj; = 1. 
Step 7 Forj=i+1,i+2,...,N set bj; =0. 


Step 8 Forj=1,2,...,i 
if j < m then set bj ,4; = —aj_;. 


Step9 Forj=n+i+1n+i+2,...,N setbjj =0. 
Step 10 Set bin+1 = Gi. 
(Steps 11-22 solve the linear system using partial pivoting.) 
Step 11 Fori=n+1,n+2,...,N — 1 do Steps 12-18. 


Step 12 Let k be the smallest integer with i < k < N and |by;| = maxj<j<y |b;,'|. 
(Find pivot element.) 
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Step 13 If b,; = 0 then OUTPUT (“The system is singular ”); 
STOP. 


Step 14 Ifk #Aithen (Interchange row i and row k.) 
forj =i,i+1,...,N+1set 


beopy = bij; 
bij = bey; 
bxj = beopy. 


Step 15 Forj =i+1,i+2,...,N do Steps 16-18. (Perform elimination.) 


bj 
Step 16 Setxrm= —. 
Step 17 Fork =i+1,i+2,...,N+1 
set Dix = dix —xm- bix. 
Step 18 Set bj = 0. 


Step 19 If by,y = 0 then OUTPUT (‘The system is singular”); 
STOP. 


b 
Step 20 Ifm > 0 then set g,, = [a (Start backward substitution.) 


NN 
N 
bins — Dojain DiiQ—n 
bij 


Step 22 Fori=n,n—1,...,1 set pi = bin4t = paar bi jqj-n- 


Step 23. OUTPUT (40, 41,.-+59m>P0>P1s+++>Pn); 
STOP. (The procedure was successful.) a 


Step 21 Fori=N—1,N—2,...,n+ 1 setgj_, = 


Continued Fraction Approximation 


It is interesting to compare the number of arithmetic operations required for calculations of 
Ps(x) and r(x) in Example 1. Using nested multiplication, P5(x) can be expressed as 


2 z 1 1 1 1 f ; 
so =((((-age+ ag)" g) e435) = )e+ . 


Assuming that the coefficients of 1,x,x?,x°,x+, and x° are represented as decimals, a 
single calculation of Ps(x) in nested form requires five multiplications and five addi- 
tions/subtractions. 

Using nested multiplication, r(x) is expressed as 


(cde+ §)s- 9x41 
(ax+3)x+1 


r(x) = ; 
so a single calculation of r(x) requires five multiplications, five additions/subtractions, and 
one division. Hence, computational effort appears to favor the polynomial approximation. 
However, by reexpressing r(x) by continued division, we can write 

3 3.9 1.43 
1- 5v + 20% = 60% 


r@~= 
14+ 2x4 px? 


_ Saxe 3x? = 129 + 20 
x? + 8x + 20 
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Using continued fractions for 
rational approximation is a 
subject that has its roots in the 
works of Christopher Clavius 
(1537-1612). It was employed in 
the 18th and 19th centuries by, 
for example, Euler, Lagrange, 
and Hermite. 
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1 i eee) 
3 3 
= ceca 
37° 3° 248x420 
1 17 = 
se grat - 


3 x24+8x+20 
x+(35/19) 


or 


1 WV i 
r(x) = x+—+ 


a 3 17 3125/361. \_ 
(«+4 @+G5/19)) 


Written in this form, a single calculation of r(x) requires one multiplication, five ad- 
ditions/subtractions, and two divisions. If the amount of computation required for division 
is approximately the same as for multiplication, the computational effort required for an 
evaluation of the polynomial Ps (x) significantly exceeds that required for an evaluation of 
the rational function r(x). 

Expressing a rational function approximation in a form such as Eq. (8.16) is called 
continued-fraction approximation. This is a classical approximation technique of current 
interest because of the computational efficiency of this representation. It is, however, a 
specialized technique that we will not discuss further. A rather extensive treatment of this 
subject and of rational approximation in general can be found in [RR], pp. 285-322. 

Although the rational-function approximation in Example | gave results superior to 
the polynomial approximation of the same degree, note that the approximation has a wide 
variation in accuracy. The approximation at 0.2 is accurate to within 8 x 10~°, but at 1.0 the 
approximation and the function agree only to within 7 x 107>. This accuracy variation is 
expected because the Padé approximation is based on a Taylor polynomial representation 
of e~*, and the Taylor representation has a wide variation of accuracy in [0.2, 1.0]. 


(8.16) 


Chebyshev Rational Function Approximation 


To obtain more uniformly accurate rational-function approximations we use Chebyshev 
polynomials, a class that exhibits more uniform behavior. The general Chebyshev rational- 
function approximation method proceeds in the same manner as Padé approximation, except 
that each x* term in the Padé approximation is replaced by the kth-degree Chebyshev 
polynomial 7;(x). 

Suppose we want to approximate the function f by an Nth-degree rational function r 
written in the form 


Yhe0 PkT K(X) 
yee) 


Writing f(x) in a series involving Chebyshev polynomials as 


r(x) = where VN =n+mandq = 1. 


f= >" aTOd, 


k=0 
gives 


Deo PeTe@) 


= = T a. wn Oe 
F(x) — re) D4 RO) — mr a Ti(x) 
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Example 2 


Approximation Theory 


or 


eg UTX) Dopo UTX) — Dopo PTE) 
ar T(x) 
The coefficients g1,q2,..-,Gm and po, P1,---,Pn are chosen so that the numerator on the 


right-hand side of this equation has zero coefficients for 7, (x) when k = 0,1,...,N. This 
implies that the series 


ff) -—r@)= (8.17) 


(agTo(x) + ay Ty (x) +--+) Toe) + iT (&) +--+ + GnTin(&)) 
— (poTo(x) + piT\(&) + +++ + PaTn(x)) 


has no terms of degree less than or equal to NV. 

Two problems arise with the Chebyshev procedure that make it more difficult to im- 
plement than the Padé method. One occurs because the product of the polynomial g(x) and 
the series for f(x) involves products of Chebyshev polynomials. This problem is resolved 
by making use of the relationship 


1 
FiO) @) = 5 [Tizj) + Ti @)]. (8.18) 


(See Exercise 8 of Section 8.3.) The other problem is more difficult to resolve and involves 
the computation of the Chebyshev series for f(x). In theory, this is not difficult for if 


f@) =o aT), 
k=0 


then the orthogonality of the Chebyshev polynomials implies that 


if 2f) T; 
ie f@) dx and aq=— PON where k > 1. 


ae -1VJV1—x2 mJ) V1l—x? 


Practically, however, these integrals can seldom be evaluated in closed form, and a 
numerical integration technique is required for each evaluation. 


The first five terms of the Chebyshev expansion for e~* are 
Ps(x) = 1.26606679(x) — 1.1303187) (x) + 0.2714957T>(x) — 0.04433773(x) 
+ 0.005474T, (x) — 0.000543Ts (x). 


Determine the Chebyshev rational approximation of degree 5 with n = 3 and m = 2. 


Solution Finding this approximation requires choosing /, P1, P2, P3, 71, and gz so that for 
k =0, 1, 2, 3, 4, and 5, the coefficients of T;,(x) are 0 in the expansion 


Ps(x)[To(x) + 9171 (x) + 92T2(x)] — [poTo() + piTi (x) + p2Tr(x) + p3T3(x)]. 
Using the relation (8.18) and collecting terms gives the equations 


Ty: 1.266066 — 0.565159, + 0.1357485q2 = po, 
T;: —1.130318 + 1.401814q, — 0.587328q2 = pu, 
T,: 0.271495 — 0.587328q, + 1.26880392 = pr, 
T3: —0.044337 + 0.138485g, — 0.565431q2 = ps, 
Ty: 0.005474 — 0.022440g, + 0.135748q2 = 0, 
Ts: —0.000543 + 0.002737q, — 0.022169q = 0. 
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The solution to this system produces the rational function 


1.055265T (x) — 0.6130167; (x) + 0.0774787> (x) — 0.00450673 (x) 
To (x) + 0.37833 1T; (x) + 0.0222167> (x) , 


We found at the beginning of Section 8.3 that 
To(x) = 1, Ti) =x, Tox) = 2x? — 1, T3(x) = 4x — 3x. 


r(x) = 


Using these to convert to an expression involving powers of x gives 


0.977787 — 0.599499x + 0.154956x? — 0.018022x3 
0.977784 + 0.378331x + 0.044432x? 


Table 8.11 lists values of rr(x) and, for comparison purposes, the values of r(x) obtained 
in Example 1. Note that the approximation given by r(x) is superior to that of rr(x) for 
x = 0.2 and 0.4, but that the maximum error for r(x) is 6.33 x 107° compared to 9.13 x 10-6 


r(x) = 


for rr (x). | 
a e r(x) le" = r(0)| rr(x) le* — rr) 
0.2 0.81873075 0.81873075 7.55 x 107° 0.81872510 5.66 x 10~° 
0.4 0.67032005 0.6703 1963 4.11 x 1077 0.67031310 6.95 x 10~° 
0.6 0.54881164 0.54880763 4.00 x 10~° 0.54881292 1.28 x 10~° 
0.8 0.44932896 0.44930966 1.93 x 10-5 0.44933809 9.13 x 10~° 
1.0 0.36787944 0.36781609 6.33 x 107-5 0.36787155 7.89 x 10~° 


The Chebyshev approximation can be generated using Algorithm 8.2. 


Chebyshev Rational Approximation 
To obtain the rational approximation 


reo PRT (x) 


= eo HTX) 


for a given function f (x): 
INPUT nonnegative integers m and n. 


OUTPUT coefficients go, q1,..-,Gm and Po, Pi,+-+sPn- 
Step 7 SetN=m-+n. 


2 ri 
Step 2 Setay = — / f(cos@) d0; (The coefficient ag is doubled for computational 
m JO efficiency.) 
Fork = 1,2,...,N+mset 


2: rs 
a= =f f(cos@) coské dé. 
Tu JO 


(The integrals can be evaluated using a numerical integration procedure or the 
coefficients can be input directly.) 


Step 3 Setgqo=1. 


Step 4 Fori=0,1,...,N do Steps 5-9. (Set up a linear system with matrix B.) 
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Step 5 Forj=0,1,...,i 
if j <n then set b;; = 0. 


Step 6 Ifi<nthenset b;; = 1. 
step 7 Forj=i+ 1,1+2,...,nset b,;= 0. 


Step 8 Forj=n+1,n+2,...,N 
if i ~ 0 then set bij = 5 (isj—n + Qi—j+n|) 
else set bj; = —34j-n- 


Step 9 Ifi #0 then set biy41 = a; 
else set bjy41 = 5d). 


(Steps 10-21 solve the linear system using partial pivoting.) 
Step 10 Fori=n+1,n+2,...,N — 1 do Steps 11-17. 
Step 11 Let k be the smallest integer with i < k < N and 


|by,i| = Maxj<j<y |bj;|. (Find pivot element.) 
Step 12 If b,; = 0 then OUTPUT (‘The system is singular’); 
STOP. 


Step 13 Ifk Aithen (Interchange row i and row k.) 
forj =i,i+1,...,N+1 set 


beopy = bij; 
bij = bey; 
bxj = beopy. 
Step 14 Forj=i+1,i+2,...,N do Steps 15-17. (Perform elimination.) 
bi 
Step 15 Setxm= a 
Step 16 Fork =i+1,i+2,...,.N+1 
set Dix = Dix — xm - big. 
Step 17 Set bj = 0. 
Step 18 If by.y = 0 then OUTPUT (“The system is singular’); 


STOP. 
b 
Step 19 Ifm > O then set gy = a (Start backward substitution.) 
NN 
bi, 1 ff bi, idj—n 
Step 20 Fori=N—1,N—2,...,n+1 set gin = vt aoe a, 
Step 21 Fori=n,n—1,...,0 set pj = bins _ paar bi,jj-n- 
Step 22 OUTPUT (qo, 91,- +++ 4m+Po>P1>+++>Pn); 
STOP. (The procedure was successful.) a 


We can obtain both the Chebyshev series expansion and the Chebyshev rational ap- 
proximation using Maple using the orthopoly and numapprox packages. Load the packages 
and then enter the command 


g := chebyshev(e™, x, 0.00001) 
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In 1930, Evgeny Remez 
(1896-1975) developed general 
computational methods of 
Chebyshev approximation for 
polynomials. He later developed a 
similar algorithm for the rational 
approximation of continuous 
functions defined on an interval 
with a prescribed degree of 
accuracy. His work encompassed 
various areas of approximation 
theory as well as the methods for 
approximating the solutions of 
differential equations. 
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The parameter 0.000001 tells Maple to truncate the series when the remaining coefficients 
divided by the largest coefficient is smaller that 0.000001. Maple returns 


1.2660658787 (0, x) — 1.1303182087 (1, x) + .2714953396T (2, x) — 0.044336849857 (3, x) 
+ 0.005474240442T (4, x) — 0.00054292631197 (5, x) + 0.00004497732296T (6, x) 
— 0.000003 1984364627 (7, x) 


The approximation to e~°* = 0.449328964 is found with 
evalf(subs(x = .8, g)) 
0.4493288893 


To obtain the Chebyshev rational approximation enter 
gg := convert(chebyshev(e~*, x, 0.00001), ratpoly, 3, 2) 
resulting in 


__ 0.9763521942 — 0.5893075371x + 0.1483579430x? — 0.01643823341x3 
pee 0.9763483269 + 0.3870509565x + 0.04730334625x? 


We can evaluate 9(0.8) by 
evalf(subs(x = 0.8, g)) 


which gives 0.4493317577 as an approximation to e~°8 = 0.449328964., 

The Chebyshev method does not produce the best rational function approximation 
in the sense of the approximation whose maximum approximation error is minimal. The 
method can, however, be used as a starting point for an iterative method known as the second 
Remez’ algorithm that converges to the best approximation. A discussion of the techniques 


involved with this procedure and an improvement on this algorithm can be found in [RR], 
pp. 292-305, or in [Pow], pp. 90-92. 


EXERCISE SET 84 


1. 


Determine all degree 2 Padé approximations for f(x) = e”*. Compare the results at x; = 0.2i, for 
i= 1,2,3,4,5, with the actual values f(x). 

Determine all degree 3 Padé approximations for f(x) = x In(x+ 1). Compare the results at x; = 0.2i, 
for i = 1,2,3,4,5, with the actual values f (x;). 

Determine the Padé approximation of degree 5 with n = 2 and m = 3 for f(x) = e*. Compare the 
results at x; = 0.27, for i = 1,2,3,4,5, with those from the fifth Maclaurin polynomial. 

Repeat Exercise 3 using instead the Padé approximation of degree 5 with n = 3 and m = 2. Compare 
the results at each x; with those computed in Exercise 3. 

Determine the Padé approximation of degree 6 with n = m = 3 for f(x) = sinx. Compare the results 
at x; = 0.1i, fori = 0,1,...,5, with the exact results and with the results of the sixth Maclaurin 
polynomial. 

Determine the Padé approximations of degree 6 with (a) n = 2,m = 4 and (b) n = 4, m = 2 for 
f(x) = sinx. Compare the results at each x; to those obtained in Exercise 5. 

Table 8.10 lists results of the Padé approximation of degree 5 with n = 3 and m = 2, the fifth 
Maclaurin polynomial, and the exact values of f(x) = e-* when x; = 0.2i, fori = 1, 2, 3, 4, 
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and 5. Compare these results with those produced from the other Padé approximations of degree 
five. 
a n=0,m=5 b n=1,m=4 ce n=3,m=2 d. n=4,m=1 


8. Express the following rational functions in continued-fraction form: 


x7 43x42 4x? +3x —7 
oS ee be oe 
x2—x+1 2x3 + x2 —x+5 
2x3 — 3x? + 4x —5 23 +47 -—x+3 
x27 +2x+4 * 3x3 4.22 —x +1 


9. Find all the Chebyshev rational approximations of degree 2 for f(x) = e~*. Which give the best 
approximations to f(x) = e™* at x = 0.25, 0.5, and 1? 
10. Find all the Chebyshev rational approximations of degree 3 for f(x) = cosx. Which give the best 
approximations to f(x) = cosx at x = 2/4 and 2/3? 
11. Find the Chebyshev rational approximation of degree 4 with n = m = 2 for f(x) = sinx. Compare 
the results at x; = 0.17, fori = 0, 1, 2, 3, 4, 5, from this approximation with those obtained in Exercise 
5 using a sixth-degree Padé approximation. 


12. Find all Chebyshev rational approximations of degree 5 for f(x) = e*. Compare the results at 
x; = 0.2i, fori = 1, 2, 3, 4, 5, with those obtained in Exercises 3 and 4. 


13. To accurately approximate f(x) = e* for inclusion in a mathematical library, we first restrict the 
domain of f. Given a real number x, divide by In V 10 to obtain the relation 


x=M-Invl0+s, 


where M is an integer and s is a real number satisfying |s| < 5 In 10. 
a. Show that e* = e* - 10". 


b. Construct a rational function approximation for e* using n = m = 3. Estimate the error when 
0<I|s| < 5 In V'10. 
c. Design an implementation of e* using the results of part (a) and (b) and the approximations 


1 
In /10 


14. To accurately approximate sin x and cos x for inclusion in a mathematical library, we first restrict their 
domains. Given a real number x, divide by z to obtain the relation 


= 0.8685889638 and v10 = 3.162277660. 


, ; T 
|x| = Ma +s, where M is an integer and |s| < 5 


a. Show that sinx = sgn(x) - (—1)” - sins. 

b. Construct a rational approximation to sin s using n = m = 4. Estimate the error when 0 < |s| < 
w/2. 

c. Design an implementation of sin x using parts (a) and (b). 


Repeat part (c) for cos x using the fact that cos x = sin(x + 2/2). 


8.5 Trigonometric Polynomial Approximation 


The use of series of sine and cosine functions to represent arbitrary functions had its be- 
ginnings in the 1750s with the study of the motion of a vibrating string. This problem was 
considered by Jean d’Alembert and then taken up by the foremost mathematician of the 
time, Leonhard Euler. But it was Daniel Bernoulli who first advocated the use of the infinite 
sums of sine and cosines as a solution to the problem, sums that we now know as Fourier 
series. In the early part of the 19th century, Jean Baptiste Joseph Fourier used these series 
to study the flow of heat and developed quite a complete theory of the subject. 
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During the late 17th and early 
18th centuries, the Bernoulli 
family produced no less than 8 
important mathematicians and 
physicists. Daniel Bernoulli’s 
most important work involved the 
pressure, density, and velocity of 
fluid flow, which produced what 
is known as the Bernoulli 
principle. 


Joseph Fourier (1768-1830) 
published his theory of 
trigonometric series in Théorie 
analytique de la chaleur to solve 
the problem of steady state heat 
distribution in a solid. 


Example 1 
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The first observation in the development of Fourier series is that, for each positive 
integer n, the set of functions {¢0, ¢1,..., G2n—-1}, where 


1 
go (x) = 5 

ox(x) = coskx, foreachk = 1,2,...,n, 
and 


On+k(x) = sinkx, foreachk = 1,2,...,n—1, 


is an orthogonal set on [—7r, 27] with respect to w(x) = 1. This orthogonality follows from 
the fact that for every integer j, the integrals of sin jx and cos jx over [—z, 1] are 0, and we 
can rewrite products of sine and cosine functions as sums by using the three trigonometric 
identities 


1 
sin f, sin fp = 7 eoslt — to) — cos(t; + t)], 
1 
COS ft] COS fy = qleos(t1 — to) + cos(t} + t)], (8.19) 
: Ti. ; 
sin t) COS fy = 5lsin(n — to) + sin(t) + f2)]. 


Orthogonal Trigonometric Polynomials 


Let 7, denote the set of all linear combinations of the functions ¢o, ¢1,..., 2,1. This set 
is called the set of trigonometric polynomials of degree less than or equal to n. (Some 
sources also include an additional function in the set, @2,(x) = sin nx.) 

For a function f € C[—2,2], we want to find the continuous least squares approxi- 
mation by functions in 7, in the form 


n—1 


+ a, cosnx + Ya cos kx + by, sin kx). 
k=l 


ao 


Sn (x) = 2 


Since the set of functions {¢9, 6),...,@2,_1} is orthogonal on [—z,z] with respect to 
w(x) = 1, it follows from Theorem 8.6 on page 515 and the equations in (8.19) that the 
appropriate selection of coefficients is 


= (ue f(x)coskxdx 1 7 


a = i" Tre =— | f(x) coskxdx, foreachk =0,1,2,...,n, (8.20) 
and 
J”, f@)sinkxdx 1 [* 
bp = = f(x) sinkx dx, foreachk = 1,2,...,n—1. (8.21) 


of Ginkxs)dx ot Jy 


The limit of $,(«) when n — oo is called the Fourier series of f. Fourier series are used 
to describe the solution of various ordinary and partial-differential equations that occur in 
physical situations. 


Determine the trigonometric polynomial from 7, that approximates 


f@=\|x|, for -—-aw<x<zZ. 
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Solution We first need to find the coefficients 


a 1° TO as 2 
a= — |x| dx = —— x dx + — x dx = — xdx=T, 
Jon WT Jn WT Jo Tw Jo 


1 7 2 fF? 2 k 
i |x| cos kx dx = — x cos kx dx = —~ [(-1) =A, 
a Jon uw Jo wk 


for each k = 1,2,...,n, and 


1 8 
by --| |x|sinkx dx =0, foreachk = 1,2,...,n—1. 
Jon 
That the b;’s are all 0 follows from the fact that g(x) = |x| sin kx is an odd function for 
each k, and the integral of a continuous odd function over an interval of the form [—a, a] 
is 0. (See Exercises 13 and 14.) The trigonometric polynomial from 7, approximating f is 


therefore, 
to - Beaty si 
Si) = 5 += > a COs ke. 
k=1 
The first few trigonometric polynomials for f(x) = |x| are shown in Figure 8.13. a 


Figure 8.13 


y =S,(x) = 3 = cos x cos 3x 


y = S\@) = S,@) = ¥ —Zcosx 


y = Sx) =% 


The Fourier series for f is 


: © . 2 eT Si 
S(x) = im. Si(x) = 5 + . oe QB cos kx. 


Since |coskx| < 1 for every k and x, the series converges, and S(x) exists for all real 
numbers x. 
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Discrete Trigonometric Approximation 


There is a discrete analog that is useful for the discrete least squares approximation and the 
interpolation of large amounts of data. 

Suppose that a collection of 2m paired data points {(x;, yo ' is given, with the first 
elements in the pairs equally partitioning a closed interval. For convenience, we assume 
that the interval is [—2r, 2], so, as shown in Figure 8.14, 


x= —m + (4)=. for each j = 0,1,...,2m—1. (8.22) 


m 


If it is not [—2r, 7], a simple linear transformation could be used to transform the data into 
this form. 


Figure 8.14 


The goal in the discrete case is to determine the trigonometric polynomial S,,(x) in 7, 
that will minimize 


2m-1 


E(S,) = >| by — Sa@l. 


j=0 


To do this we need to choose the constants dg, d1,...,4n, 01, b2,...,by,—1 to minimize 


2m—1 n—-1 > 
E(S,) = ~~ {» _ E + An COS NX; + XCF cos kx; + b, sin kp] : (8.23) 
j=0 k=1 


The determination of the constants is simplified by the fact that the set {@o, ¢1,..., 
2n—1} is orthogonal with respect to summation over the equally spaced points einen ‘in 
[—2, 7]. By this we mean that for each k 4 J, 


2m—-1 


Y> di (aj) bi(aj) = 0. (8.24) 


j=0 


To show this orthogonality, we use the following lemma. 


Lemma 8.12 Suppose that the integer r is not a multiple of 2m. Then 


2m—1 2m—1 
e y cosrx; = 0 and y sin rx; = 0. 
j=0 j=0 


Moreover, if r is not a multiple of m, then 


2m—1 2m—1 
e » (cos rxj)° =m _ and » (sin rj)” =m. | 
j=0 j=0 
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Euler first used the symbol i in Proof Euler’s Formula states that with i? = —1, we have, for every real number z, 
1794 to represent ./—1 in his ; 
memoir De Formulis e* = cosz+isinz. (8.25) 


Differentialibus Angularibus. . . : 
Applying this result gives 


2m—-1 2m—-1 2m—1 2m-1 
cos rxj +i ) sin rxj = ) (cos rx; + isin rx;) = ) ea, 
j=0 j=0 j=0 j=0 
But 
i = ei t+ in/m) =e lt, elit /m 
sO 
2m—-1 2m—1 2m—-1 
y cos rx; +i y sinrx; =e" y gone, 
j=0 j=0 j=0 
2m-1 
Since ) e’/"/™ is a geometric series with first term 1 and ratio e’’”/”" 4 1, we have 
j=0 
2m—1 t— (graye _ evra 
y eltinimn _ _ 
~ {-— eirx/m _ i eirm/m 7 
j=0 


But ce?" = cos2rm + isin2ra = 1, so 1 — e*”" = Oand 


2m—1 2m—1 2m—1 
y cos rxj +7 y sinrxj =e "” y elinin — (0). 
j=0 j=0 j=0 


This implies that both the real and imaginary parts are zero, so 


2m—-1 2m-1 


) cosrx; =O and ) sin rx; = 0. 
j=0 j=0 


In addition, if 7 is not a multiple of m, these sums imply that 


2m-1 2m-1 2m—1 


1 1 1 
2 (cos rxj)? = 2 5 (1 + cos 2rx;) 5 2m + 2 cos 2rxj | = 3 em + 0) =m 


and, similarly, that 


2m—1 2m—1 1 
> (sin rxj)” = » 5 (1 — cos 2rxj) = m. on 8 
j=0 j=0 

We can now show the orthogonality stated in (8.24). Consider, for example, the case 
2m—1 2m—1 
Yo be) bn4109) = D> (cos kxj)(sin Ly). 
j=0 j=0 


Since 


1 
cos kx; sin [xj = alsin? + k)x; + sin(l — k)x;] 
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and (J + k) and (/ — k) are both integers that are not multiples of 2m, Lemma 8.12 implies 


that 
2m—1 l 2m—1 2m—1 1 
Y— (cos kxj)(sin Lj) = 5 Yo sind + kay + S> sind — b)xy | = 5 0+0) =0. 
j=0 j=0 j=0 


This technique is used to show that the orthogonality condition is satisfied for any pair 
of the functions and to produce the following result. 


Theorem 8.13 The constants in the summation 


n—1 


ao ‘ 
Sn(x) = — +a, cosnx + Cr cos kx + by sin kx) 
2 
k=1 
that minimize the least squares sum 
2m—1 
E(ap,...54n,D1,.--4bn1) = D> 04 — Sn)? 
j=0 


are 
l 2m—1 
ee Qqa=— y yjcoskx;, foreachk =0,1,...,n 
m 
j=0 


and 


2m—1 
° b= 5D sin, foreach k = 1,2,...,2—1. a 


The theorem is proved by setting the partial derivatives of E with respect to the a,’s 
and the b,’s to zero, as was done in Sections 8.1 and 8.2, and applying the orthogonality to 
simplify the equations. For example, 


2m—-1 
0= ~ =2 S* by — SaGg)]C sin ky), 
k j=0 
so 
2m—1 2m-1 
0= ) Yj sin kx; == ) Sn (xj) sin kx; 
j=0 j=0 
2m—1 2m—-1 2m—-1 
2 ao : ; 
= ) yj sin kx; — 5 ) sin kxj — ay ) sin kx; cos nx; 
j=0 j=0 j=0 
n—1 2m—-1 n—1 2m—1 2m—-1 
: . : ‘ 2 
— y qa y sin kx; cos lx; — y by y sin kx; sin Lxj — by y (sin kx;)°. 
l=1  j=0 . j=0 j=0 
Ié¢k 


The orthogonality implies that all but the first and last sums on the right side are zero, 
and Lemma 8.12 states the final sum is m. Hence 
2m—-1 
O= 5° yjsinkx — mb, 
j=0 
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which implies that 


1 2m—1 
b= — , sin kx;. 
amr 2 Jj ij 


The result for the a,’s is similar but need an additional step to determine ag (See 
Exercise 17.) 


Example 2 Find S2(x), the discrete least squares trigonometric polynomial of degree 2 for f(x) = 
2x? — 9 when x is in [—z, 1]. 


Solution We have m = 2(2) — 1 = 3, so the nodes are 
x= E+ i and yj= f= 2x? —9, forj = 0,1,2,3,4,5. 
m 
The trigonometric polynomial is 
1 : 
S3(x) = 5% + ay cos 2x + (a, cosx + b; sin x), 


where 


5 


5 
1 1 : 
ay = 3 2 yjcoskx;, fork =0,1,2, and bj = 5 2 y; Sin xj. 


The coefficients are 


_! 2m T\) 6 i 27 \) _ _4 19944566 
am = 5 (F-m+4(-F) +4(-F) r@+2(5)+4(F)) =-4 , 
a = 5 (F-meos(—m) + £ (= ) 00s (-) + r (-F) e0s (-F) £00) 050 

A A 20 20 
+F(4) cos (=) +( : ) cos ( : )) = —8.77298169, 

Jd 5 21 An Tv 20 0 0 

r= 5 (Fm o0s(- m+ F( = ) cos ( e+ F( ~) 0s ( =) £0) 00s 


a4 20 20 4a 
4f (5) cos (F) 4f (+) cos ()) = 2.92432723, 
and 


ie ; (t-m sin(—) + f (-=) sin(-2) + ¢(-4) (-4) F@sino 


“#@@+(@)(%)- 


Thus 
1 
So(x) = 5 (—4.10944562) — 8.77298169 cos x + 2.92432723 cos 2x. 


Figure 8.15 shows f(x) and the discrete least squares trigonometric polynomial S2(x). 
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Figure 8.15 


The next example gives an illustration of finding a least-squares approximation for a 
function that is defined on a closed interval other than [—z, zr]. 


Example 3 Find the discrete least squares approximation $3 (x) for 
f@mM= x* — 3x3 4 2x” — tanx(x — 2) 
using the data {Gy.y)) eo where x; = j/5 and y; = f (xj). 
Solution We first need the linear transformation from [0, 2] to [—z, 7] given by 
z= w(x — 1). 


Then the transformed data have the form 


{(a.f (1+ 5) 


The least squares trigonometric polynomial is consequently, 


ao : ; 
$3(z) = a + a3cos3z+ XC7 cos kz + by sinkz) | , 
k=l 
where 
he gj 
a=3>f (1+ +) coskz, for k = 0,1,2,3, 
j=0 
and 
en gj 
- rs * : ae 
— 3h f (1 + *) sink, fork = 1,2. 
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Table 8.12 


Approximation Theory 


Evaluating these sums produces the approximation 
S3(z) = 0.76201 + 0.77177 cos z + 0.017423 cos 2z + 0.0065673 cos 3z 
— 0.38676 sin z + 0.047806 sin 2z, 
and converting back to the variable x gives 


S3(x) = 0.76201 + 0.77177 cos r(x — 1) + 0.017423 cos 27 (x — 1) 
+ 0.0065673 cos 3x (x — 1) — 0.38676 sin x(x — 1) + 0.047806 sin 27 (x — 1). 


Table 8.12 lists values of f(x) and 53(x). | 
x ff) S3 (x) If) — S3(x)| 

0.125 0.26440 0.24060 2.38 x 107? 

0.375 0.84081 0.85154 1.07 x 107? 

0.625 1.36150 1.36248 9.74 x 10-4 

0.875 1.61282 1.60406 8.75 x 1073 

1.125 1.36672 1.37566 8.94 x 1073 

1.375 0.71697 0.71545 1.52 x 1073 

1.625 0.07909 0.06929 9.80 x 1073 

1.875 —0.14576 —0.12302 2.27 x 107? 


EXERCISE SET 85 


wpe YN 


10. 


Find the continuous least squares trigonometric polynomial S>(x) for f(x) = x? on [—z, 1]. 

Find the continuous least squares trigonometric polynomial S,(x) for f(x) = x on [—7, 7]. 

Find the continuous least squares trigonometric polynomial S3(x) for f(x) = e* on[—z,z]. 

Find the general continuous least squares trigonometric polynomial S,,(x) for f(x) = e* on[—z, z]. 


Find the general continuous least squares trigonometric polynomial S,,(x) for 


0, if —aw<x<0, 


1, if O<x<z. 


jo| 


Find the general continuous least squares trigonometric polynomial S,,(x) in for 


r= | 1, i m<x <0. 

1, if0<x<zZ. 

Determine the discrete least squares trigonometric polynomial S,,(x) on the interval [—z, z] for the 
following functions, using the given values of m and n: 

a. f(x) =cos2x, m=4,n=2 b. f@) =cos3x, m=4,n=2 

ce. f(x) = sin; +2cos}, m= 6,n = 3 d. f(x) =x’ cosx, m=6,n =3 

Compute the error E(S,,) for each of the functions in Exercise 7. 

Determine the discrete least squares trigonometric polynomial $3 (x), using m = 4for f (x) = e* cos 2x 
on the interval [—z, 2]. Compute the error E(S3). 


Repeat Exercise 9 using m = 8. Compare the values of the approximating polynomials with the values 
of f at the points & = —z + 0.2jz, for 0 <j < 10. Which approximation is better? 
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11. 


12. 


13. 
14. 


15. 


16. 


17. 
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Let f(x) = 2tanx — sec2x, for 2 < x < 4. Determine the discrete least squares trigonometric 
polynomials S,,(x), using the values of n and m as follows, and compute the error in each case. 
a n=3, m=6 bh n=4, m=6 


a. Determine the discrete least squares trigonometric polynomial S4(x), using m = 16, for f(x) = 


x? sin x on the interval [0, 1]. 


b. Compute f) S4(x) dx. 

c. Compare the integral in part (b) to is x? sin x dx. 

Show that for any continuous odd function f defined on the interval [—a, a], we have f i“ f(x)dx = 0. 
Show that for any continuous even function f defined on the interval [—a, a], we have [ - f@) dx = 
2 J, Ft) dx. 

Show that the functions ¢do(x) = 1/2,¢1(4) = cosx,...,dn(x) = cosnx,dn4i(x) = sinx,..., 
Pon—1 (x) = sin(n — 1)x are orthogonal on [—7, 7] with respect to w(x) = 1. 

In Example 1 the Fourier series was determined for f(x) = |x|. Use this series and the assumption 
that it represents f at zero to find the value of the convergent infinite series °° )(1/(2k + 1)?). 
Show that the form of the constants a, for k = 0,...,n in Theorem 8.13 is correct as stated. 


| 8.6 Fast Fourier Transforms 


In the latter part of Section 8.5, we determined the form of the discrete least squares poly- 
nomial of degree n on the 2m data points {(x;, yn ' where xj = —m + (j/m)z, for each 
j=0,1,...,2m—1. 

The interpolatory trigonometric polynomial in T,, on these 2m data points is nearly 
the same as the least squares polynomial. This is because the least squares trigonometric 


polynomial minimizes the error term 


2m—-1 


E(Sm) = > (7 — Sn)’ 


j=0 


and for the interpolatory trigonometric polynomial, this error is 0, hence minimized, when 
the S,,(x;) = yj, for each j = 0,1,...,2m—1. 

A modification is needed to the form of the polynomial, however, if we want the 
coefficients to assume the same form as in the least squares case. In Lemma 8.12 we found 
that if r is not a multiple of m, then 


2m—1 


a (cos rj)? =m. 
j=0 


Interpolation requires computing instead 


2m-1 


2 
pa (cos mx)”, 
j=0 


which (see Exercise 8) has the value 2m. This requires the interpolatory polynomial to be 
written as 


m—-1 
Sm(x) = NN + (ax cos kx + by sin kx), (8.26) 
k=1 


if we want the form of the constants a, and b; to agree with those of the discrete least 
squares polynomial; that is, 
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Leonhard Euler first gave this 
formula in 1748 in Introductio in 
analysin infinitorum, which made 
the ideas of Johann Bernoulli 
more precise. This work bases 
the calculus on the theory of 
elementary functions rather than 


curves. 


Approximation Theory 


1 2m—1 
e a = — | yycos key, for eachk = 0,1,...,m, and 
j=0 


2m—1 
Y> yjsinkx; foreach k = 1,2,...,m—1. 
j=0 


1 


m 


The interpolation of large amounts of equally-spaced data by trigonometric polyno- 
mials can produce very accurate results. It is the appropriate approximation technique in 
areas involving digital filters, antenna field patterns, quantum mechanics, optics, and in 
numerous simulation problems. Until the middle of the 1960s, however, the method had 
not been extensively applied due to the number of arithmetic calculations required for the 
determination of the constants in the approximation. 

The interpolation of 2m data points by the direct-calculation technique requires approxi- 
mately (2m)* multiplications and (2m)? additions. The approximation of many thousands of 
data points is not unusual in areas requiring trigonometric interpolation, so the direct meth- 
ods for evaluating the constants require multiplication and addition operations numbering 
in the millions. The roundoff error associated with this number of calculations generally 
dominates the approximation. 

In 1965, a paper by J. W. Cooley and J. W. Tukey in the journal Mathematics of 
Computation [CT] described a different method of calculating the constants in the inter- 
polating trigonometric polynomial. This method requires only O(m log, m) multiplications 
and O(m log, m) additions, provided m is chosen in an appropriate manner. For a problem 
with thousands of data points, this reduces the number of calculations from millions to 
thousands. The method had actually been discovered a number of years before the Cooley- 
Tukey paper appeared but had gone largely unnoticed. ([Brigh], pp. 8-9, contains a short, 
but interesting, historical summary of the method.) 

The method described by Cooley and Tukey is known either as the Cooley-Tukey 
algorithm or the fast Fourier transform (FFT) algorithm and has led to a revolution in 
the use of interpolatory trigonometric polynomials. The method consists of organizing the 
problem so that the number of data points being used can be easily factored, particularly 
into powers of two. 

Instead of directly evaluating the constants a, and b;, the fast Fourier transform pro- 
cedure computes the complex coefficients c; in 


1 2m—1 
= cpe™, (8.27) 
m 
k=0 
where 
2m—1 
Ch = > yje*™i/™, for each k = 0,1,...,2m—1. (8.28) 
j=0 


Once the constants c, have been determined, a, and by, can be recovered by using 
Euler’s Formula, 


e* = cosz+isinz. 
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For each k = 0, 1,...,m we have 


1 I 1 2m—-1 1 2m-1 
be —ink __ ikrj/m_,—ink __ ik(—1 +(2j/m)) 
—cz(—1)" = —cxe =— ie e a ie 
m m m ey % m 2 “ 
j=0 j=0 
2m—-1 


ll 
S| 
S 
~~ 
° 
° 
Dn 
> 
| 


x +71) isin (-2 + 24)) 
m m 


j=0 
1 2m—1 
=— yj(cos kx; + isin kx;). 
m = 
So, given cx, we have 
: (-1)* 
a, + iby = Ck. (8.29) 


For notational convenience, bp and b,, are added to the collection, but both are 0 and do not 
contribute to the resulting sum. 

The operation-reduction feature of the fast Fourier transform results from calculating 
the coefficients c, in clusters, and uses as a basic relation the fact that for any integer n, 


nmi 


e"™' — cosnm +isinna = (—1)". 


Suppose m = 2? for some positive integer p. For each k = 0,1,...,m— 1 we have 


2m—1 2m—-1 2m—1 
ikaj/m i(m+k)aj/m ikaj/m ij 
he Y yer) err Sy yen te): 
j=0 j=0 j=0 


But 


2, ifj is even, 


f} elt _ 
0, ifj is odd, 


so there are only m nonzero terms to be summed. 
If j is replaced by 27 in the index of the sum, we can write the sum as 


m—1 


ier (2; 
ce + Cm+tk = 2 y yoje! TC. A) /m. 
j=0 


that is, 


m—1 
Ce + Cmte = 2 > yojeihrs/ (ml?) (8.30) 
j=0 


In a similar manner, 


m—1 


Ck — Cm+k = Qelkn/m one (8.31) 
j=0 


Since c, and ¢,,4, can both be recovered from Eqs. (8.30) and (8.31), these relations de- 
termine all the coefficients c,. Note also that the sums in Eqs. (8.30) and (8.31) are of the 
same form as the sum in Eq. (8.28), except that the index m has been replaced by m/2. 
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CHAPTER 8 « 


Illustration 


Approximation Theory 


There are 2m coefficients co, C1,...,C2m—1 to be calculated. Using the basic formula 
(8.28) requires 2m complex multiplications per coefficient, for a total of (2m)* operations. 
Equation (8.30) requires m complex multiplications for each k = 0,1,...,m— 1, and 
(8.31) requires m + 1 complex multiplications for each k = 0,1,...,m — 1. Using these 
equations to compute Co, C1, .. . ,C2m—1 reduces the number of complex multiplications from 
(2m)? = 4m? to 


m-m+m(m+ 1) = 2m +m. 


The sums in (8.30) and (8.31) have the same form as the original and m is a power of 2, 
so the reduction technique can be reapplied to the sums in (8.30) and (8.31). Each of these 
is replaced by two sums from j = 0 to j = (m/2) — 1. This reduces the 2m? portion of the 
sum to 


a(S. S45. (S41)[ ant +m 
22 2 \2 ~ 


So a total of 
(m? +m) +m=m +2m 


complex multiplications are now needed, instead of (2m). 
Applying the technique one more time gives us 4 sums each with m/4 terms and reduces 
the m? portion of this total to 


[Gy +4G+)]-F +m 


for a new total of (m?/2) + 3m complex multiplications. Repeating the process r times 
reduces the total number of required complex multiplications to 


The process is complete when r = p + 1, because we then have m = 2? and 2m = 
2?*!. As a consequence, after 7 = p + | reductions of this type, the number of complex 
multiplications is reduced from (2m)? to 


Gry 
Qp-1 


+ m(p + 1) = 2m+ pm+m=3m+miog,m = O(mlog, m). 


Because of the way the calculations are arranged, the number of required complex additions 
is comparable. 

To illustrate the significance of this reduction, suppose we have m = 2!° = 1024. The 
direct calculation of the c;, for k = 0,1,...,2mm — 1, would require 


(2m)* = (2048)* ~ 4,200,000 
calculations. The fast Fourier transform procedure reduces the number of calculations to 
3(1024) + 1024 log, 1024 ~ 13,300. 
Consider the fast Fourier transform technique applied to 8 = 23 data points {Qyj, yo» 


where x; = —2 + jm/4, for each j = 0,1,...,7. In this case 2m = 8, som = 4 = 2? and 
p=2. 
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From Eq. (8.26) we have 


3 

i 3 
ee 4. “(ax cos kx + by sin kx), 
2 
k=1 
where 
ee t< 

n= 5 29/008 and by = 7 d sinkx;, k=0,1,2,3,4. 


Define the Fourier transform as 


7 
1 ; 
i > ee, 
j=0 


where 
4 
= yer fork =, Ajceg7, 


Then by Eq. (8.31), for k = 0, 1,2,3,4, we have 


1 —ikn =f ib 
—Crée = IDK. 
ack ak k 


By direct calculation, the complex constants c; are given by 


Co =Yo +1 + 2 +3 +4 +5 + Yo + 75 
i+1 j i+1 : i-1 
c) =yo + ya)? yy + iyo + ¥3 Vi y5 — 1Y¥6 Vi yn 


c2 =yo + iy1 — y2 — ty3 + ya + iys — Yo — iy7; 


(Bo-m-(B) 
ov (Bone (n-ne (Gane 
(S)-ma('G) 


Cc 


wW 


c4 =o — yi + Y2 — ¥3 + 4 — 5 + 6 — 75 


i+1 . _ i+1 
cs =Yo — Vi yi + ly2 — = ya Vi 


Co =Yo — ty1 — Yo + 13 + y4 — is — Yo + iy; 


(=) (=) (5 -) ‘3 +(S) 

c7 =Yo yi — Ly2 y3 — Y4 5 + 16 

ee Ne oe oe 
Because of the small size of the collection of data points, many of the coefficients of the y; 
in these equations are 1 or —1. This frequency will decrease in a larger application, so to 
count the computational operations accurately, multiplication by 1 or —1 will be included, 
even though it would not be necessary in this example. With this understanding, 64 mul- 
tiplications/divisions and 56 additions/subtractions are required for the direct computation 
of co, C1, wei CFs 
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To apply the fast Fourier transform procedure with r = 1, we first define 


Co + 4 C2 + C6 
dy = 5 =yotyo+y4+ Yo; d4 = 5 = yo —Y2 + ¥4 — Yos 
co — C4 C2 — C6 : 
d= ; =yr ty3+ys +75 ds = 5 = i(v1 — y3 + ys — y7); 
cyte : : om+c : : 
dy = ar > = yo + iyn — ya — ives de = S 7 =yo ly2 — y4 + lY6; 
cy — C5 C3 — C7 
a3 = a= 
; 5 t 2 
i+1 ; ; i-1l . : 
= 97} (1 + iy3 — ys — ty7)5 = WN; (1 — iy3 — ys + iy7). 
We then define, for r = 2, 
dy + d4 dy + do 
eo = = yo t+ yas e4 = Yo — ¥43 
2 2 
dy — d. dy — d, 
e =——— = yn + ye; 5 = = i(yn — yo) 
2 2 
id, + ds . . id; +d, i-1 
2=—s = i011 + ys); &6 a 0 (v1 — ys); 
_idj—ds__ eae: id d;_, i-—1l a ) 
= = 103 +97); e7 yt Te 3—)7 
Finally, for r = p+ 1 = 3, we define 
|e a f EDN 8 i-1\ | 
0=—s_ = Yo 4= “A Js 
jst, ee So EE ia i-1\ | 
1 = 5) = V4; = ~~ fd Y5> 
ie, tes. (i — —— i 
h= a 2 fo = = y33 
ol EY ag f ieee -i-1 
— —ae | . =. = a 
3 5 V6 7 2 2 y7 
The co,...,¢7, do,..-,d7, €0,.--,@7, and fo,..., f7 are independent of the particular data 


points; they depend only on the fact that m = 4. For each m there is a unique set of 


2m—1 
k=0 ? 


2m—1 


constants {cx lara " {dx} k=0 >? 


{ex} 


and { fies This portion of the work is not needed 


for a particular application, only the following calculations are required: 


The fx: 


fo=ys fi=ys flrp=bda fp=ive; 


i- 1 
fa= ( 


/2 


on i= (S)s foe=- 
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i+1 
V2 
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The ex: 
i-1 
J/2 


a= (FS) etm a=fo-—fis se=fr— fas ee = fa— fsi eo = fo— fr. 


eo= fot fis ae: =—i(fe + fa); a=-( ) at 


The d;: 


do =eo+e1; d) =—i(e2+e3); dy=e4t+es; d3 = —i(€5 + 7); 


d4=e9—€1; ds =e2—€3; dg =e4—e5, d7 = & — e7. 


The c,: 

o=dtd; cy =a,+03; Cp =d4+d5, c3=d6 + dy; 

c4=dyp—d\; cs =dy—d3; C6 =dy—ds; C7 = do — 7. 

Computing the constants co,c1,...,C7 in this manner requires the number of operations 


shown in Table 8.13. Note again that multiplication by 1 or —1 has been included in the 
count, even though this does not require computational effort. 


Table 8.13 Step Multiplications/divisions Additions/subtractions 
(The f,:) 8 0 
(The e;:) 8 8 
(The d,:) 8 8 
(The c;:) 0 8 
Total 24 24 


The lack of multiplications/divisions when finding the c; reflects the fact that for any m, 
the coefficients fol are computed from idle a in the same manner: 


Ce = dy + dogy1 and Chm = doe — dyx41, fork =0,1,...,m—1, 
so no complex multiplication is involved. 
In summary, the direct computation of the coefficients co, c1,...,c7 requires 64 multiplica- 


tions/divisions and 56 additions/subtractions. The fast Fourier transform technique reduces 
the computations to 24 multiplications/divisions and 24 additions/subtractions. 


Algorithm 8.3 performs the fast Fourier transform when m = 2? for some positive 
integer p. Modifications of the technique can be made when m takes other forms. 


Fast Fourier Transform 


To compute the coefficients in the summation 


2m—1 2m—1 
1 : 1 
— ) ce = — ) cy(coskx + isinkx), wherei= /—1, 
an = 
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554 CHAPTER 8 «& Approximation Theory 
for the data Conn emg where m = 2? and xj = —x + ja/m forj =0,1,...,2m—1: 
INPUT ™M, P30. Y15+++>s¥2m—1- 
OUTPUT complex numbers co, ... , Czm—1; real numbers do, ...,m3D1,.+-5Pm—1- 
Step 1 SetM=m; 
q =P 
c — emi/m, 
Step 2 Forj =0,1,...,2m— 1 set cj = yj. 
Step 3 Forj=1,2,...,M set & = ¢/; 
Eiim = —&§. 
Step 4 Set K =0; 
& = 1. 
Step 5 ForL=1,2,...,p+ 1 do Steps 6-12. 
Step 6 While K < 2m — 1 do Steps 7-11. 


Step 7 Forj =1,2,...,M do Steps 8-10. 
Step8 LetK =k,-2?+kp1- 2? 14+--- +k -2+ko; 
(Decompose k.) 
set Kj = K/24=k,-2P-4+---+kgyy + 2+k,; 
Ky = kgs 2? + kay Pole + hy + 24. 
Step 9 Set n = ckiméx,; 
CK+M = CK — 0); 
CK =CkK +7. 
Step 10 SettK=K+1. 
Step 11 SetK=K+M. 
Step 12 Set K =0; 
M=M/2; 
q=q-1. 
Step 13 While K < 2m — 1 do Steps 14-16. 
Step 14 LetK =k,-2?+k,_1- DPl4...+tky-2+k; (Decompose k.) 
set j = ky - 2? +ky- 21 4---+ hy -2+khp. 
Step 15 Ifj > K then interchange c; and cg. 
Step 16 SettK=K+1. 
Step 17 Set ay = co/m; 
dm = Re(e7!""c,,/m). 
Step 18 Forj=1,...,m—1 seta; = Re(ec;/m); 
bj = Im(e“c;/m). 
Step 19 OUTPUT (co,...,C2m—1340>--+»Am301,---5bm—1)3 
STOP. = 


Example 1 Find the interpolating trigonometric polynomial of degree 2 on [—z,z] for the data 
3 
{Qy, FOG) fo» where 
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3 


3 
1 1 
m= 5 es fj) cos(kyy) fork=0,1,2 and by = 5 2 f Q%) sin). 


Solution We have 


= ; (fim) +f (-5) + fO+f (5)) — —3,19559339, 


aq, = ; (FCx) cos(—7) + f (-5) cos (-4) + f(0)cosO+ f (5)) cos (5) 
= — 9.86960441, 


ar = 5 (fm) cos(-2m) + f (-Z) cos (x) + f0)cos0 + f (Z)) cos or) 


= 4.93480220, 
and 
— ; (f(-7) sin(—m) + f (-5) sin (-3) + f(0)sin0+ f (5) sin (5)) 2%: 
So 
So(x) = ; (—3.19559339 + 4.93480220 cos 2x) — 9.86960441 cos x. 
Figure 8.16 shows f(x) and the interpolating trigonometric polynomial S» (x). a 


Figure 8.16 


The next example gives an illustration of finding an interpolating trigonometric poly- 
nomial for a function that is defined on a closed interval other than [—z, zr]. 
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Example 2 Determine the trigonometric interpolating polynomial of degree 4 on [0,2] for the data 
{G/4, fG/A)} <0» where f (x) = x* — 3x3 + 2x? — tan x(x — 2). 


Solution We first need to transform the interval [0, 2] to [—z, w]. This is given by 
= E(x; = 1), 


so that the input data to Algorithm 8.3 are 


fn70+2)f, 


The interpolating polynomial in z is 
S4(z) = 0.761979 + 0.771841 cos z + 0.0173037 cos 2z + 0.00686304 cos 3z 
— 0.000578545 cos 4z — 0.386374 sin z + 0.0468750 sin 2z — 0.0113738 sin 3z. 


The trigonometric polynomial $4(x) on [0, 2] is obtained by substituting z = 2(x — 1) 
into S4(z). The graphs of y = f(x) and y = S4(x) are shown in Figure 8.17. Values of f (x) 


and S4(x) are given in Table 8.14. | 

Figure 8.17 

nee fF) S4(x) If) — Su@)| 
0.125 0.26440 0.25001 1.44 x 10-2 
0.375 0.84081 0.84647 5.66 x 10-3 
0.625 1.36150 1.35824 3.27 x 10-3 
0.875 1.61282 1.61515 2.33 x 1073 
1.125 1.36672 1.36471 2.02 x 1073 
1.375 0.71697 0.71931 2.33 x 1073 
1.625 0.07909 0.07496 4.14 x 107-3 
1.875 —0.14576 —0.13301 1.27 x 107? 
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More details on the verification of the validity of the fast Fourier transform procedure 
can be found in [Ham], which presents the method from a mathematical approach, or in 
[Brac], where the presentation is based on methods more likely to be familiar to engineers. 
[AHU], pp. 252-269, is a good reference for a discussion of the computational aspects of 
the method. Modification of the procedure for the case when m is not a power of 2 can be 
found in [Win]. A presentation of the techniques and related material from the point of view 
of applied abstract algebra is given in [Lau, pp. 438-465]. 


EXERCISE SET 86 


1. 


Determine the trigonometric interpolating polynomial S,(x) of degree 2 on [—z, x] for the following 
functions, and graph f (x) — S2(x): 
a f(x)=7a-Z7) b f@) =x(m —x) 
-l, -a7<x<0 
ce f@=la| d. f@)= 
1, O<x<az 
Determine the trigonometric interpolating polynomial of degree 4 for f (x) = x(a — x) on the interval 
[-2, 7] using: 
a. Direct calculation; b. The Fast Fourier Transform Algorithm. 


Use the Fast Fourier Transform Algorithm to compute the trigonometric interpolating polynomial of 
degree 4 on [—7, ] for the following functions. 


a f(x) =m(x—-7) b.  f@) = [al 

ce f(x) =cosmx —2sinax d. f(x) =xcos x? + e* cos e* 

a. Determine the trigonometric interpolating polynomial S4(x) of degree 4 for f(x) = x? sinx on 
the interval [0, 1]. 

b. Compute fe S4(x) dx. 


c. Compare the integral in part (b) to A x? sinx dx. 


Use the approximations obtained in Exercise 3 to approximate the following integrals, and compare 
your results to the actual values. 


a. [ te-me b. / |x| dx 


1 


c. (cos ax — 2 sin x) dx d. / (xcosx? + e* cose") dx 


Use the Fast Fourier Transform Algorithm to determine the trigonometric interpolating polynomial 
of degree 16 for f(x) = x? cosx on [—z, z]. 


Use the Fast Fourier Transform Algorithm to determine the trigonometric interpolating polynomial 
of degree 64 for f(x) = x? cos x on [—z, ]. 


Use a trigonometric identity to show that pane (cos mx;)? = 2m. 


Show that co, ...,C2m—; in Algorithm 8.3 are given by 


co 1 1 a 1 Yo 
C1 1 C C7 vee cor y 
O _ 1 Ce? a oe can? yo 


> 


ha F ce Ce te | ha 


where ¢ = e7!/”, 
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10. In the discussion preceding Algorithm 8.3, an example for m = 4 was explained. Define vectors ec, d, 
e, f, and y as 


c = (co,..-,€7)', d = (do,...,d7)', € = (e,...,e7)', f= (fo,.--, fs Y= Oo.-- 97)’. 


Find matrices A, B, C, and D so that c = Ad, d = Be, e = C f, and f = Dy. 


| 8.7 Survey of Methods and Software 


In this chapter we have considered approximating data and functions with elementary func- 
tions. The elementary functions used were polynomials, rational functions, and trigono- 
metric polynomials. We considered two types of approximations, discrete and continuous. 
Discrete approximations arise when approximating a finite set of data with an elementary 
function. Continuous approximations are used when the function to be approximated is 
known. 

Discrete least squares techniques are recommended when the function is specified by 
giving a set of data that may not exactly represent the function. Least squares fit of data 
can take the form of a linear or other polynomial approximation or even an exponential 
form. These approximations are computed by solving sets of normal equations, as given in 
Section 8.1. 

If the data are periodic, a trigonometric least squares fit may be appropriate. Because 
of the orthonormality of the trigonometric basis functions, the least squares trigonometric 
approximation does not require the solution of a linear system. For large amounts of pe- 
riodic data, interpolation by trigonometric polynomials is also recommended. An efficient 
method of computing the trigonometric interpolating polynomial is given by the fast Fourier 
transform. 

When the function to be approximated can be evaluated at any required argument, the 
approximations seek to minimize an integral instead of a sum. The continuous least squares 
polynomial approximations were considered in Section 8.2. Efficient computation of least 
squares polynomials lead to orthonormal sets of polynomials, such as the Legendre and 
Chebyshev polynomials. Approximation by rational functions was studied in Section 8.4, 
where Padé approximation as a generalization of the Maclaurin polynomial and its extension 
to Chebyshev rational approximation were presented. Both methods allow a more uniform 
method of approximation than polynomials. Continuous least squares approximation by 
trigonometric functions was discussed in Section 8.5, especially as it relates to Fourier 
series. 

The IMSL Library provides a number of routines for approximation including 


1. Linear least squares fit of data with statistics; 

2. Discrete least squares fit of data with the user’s choice of basis functions; 
3. Cubic spline least squares approximation; 

4. Rational weighted Chebyshev approximation; 

5. Fast Fourier transform fit of data. 

The NAG Library provides routines that include computing 


1. Least square polynomial approximation using a technique to minimize round-off 
error; 


2. Cubic spline least squares approximation; 
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3. Best fit in the 7; sense; 
4. Best fit in the J, sense; 
5. Fast Fourier transform fit of data. 


The netlib library contains a routine to compute the polynomial least squares approx- 
imation to a discrete set of points, and a routine to evaluate this polynomial and any of its 
derivatives at a given point. 

For further information on the general theory of approximation theory see Powell [Pow], 
Davis [Da], or Cheney [Ch]. A good reference for methods of least squares is Lawson and 
Hanson [LH], and information about Fourier transforms can be found in Van Loan [Van] 
and in Briggs and Hanson [BH]. 
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Approximating Eigenvalues 


Introduction 


The longitudinal vibrations of an elastic bar of local stiffness p(x) and density p(x) are 
described by the partial differential equation 


a2v a av 
paz, Hh= a [pense | : 


where v(x, f) is the mean longitudinal displacement of a section of the bar from its equi- 
librium position x at time ¢. The vibrations can be written as a sum of simple harmonic 
vibrations: 


v(x, t) = > ceux (x) cos V/Ae(t = 10), 


k=0 


where 
d du, 
on [roy 9] + Anp(a)ug(x) = 0. 
Ix dx 


If the bar has length / and is fixed at its ends, then this differential equation holds for 
0 <x <Jlandv(O) = v(Z/) = 0. 


v(x) at a fixed time t 


v(x, 0) 


#V 


A system of these differential equations is called a Sturm-Liouville system, and the numbers 
A, are eigenvalues with corresponding eigenfunctions u, (x). 

Suppose the bar is | m long with uniform stiffness p(x) = p and uniform density 
p(x) = p. To approximate u and A, let h = 0.2. Then x; = 0.2), for 0 < j < 5, and we 
can use the midpoint formula (4.5) in Section 4.1 to approximate the first derivatives. This 
gives the linear system 


2 —-1 0 0 Ww U1 

Aye oO ee OO et age |. Pa 0 pe aaw, 
Oa. B al ai p | ws p 
0 0 -l 2 W4 W4 
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In this system, w; ~ u(x), for 1 < j < 4, and wo = ws = O. The four eigenvalues 
of A approximate the eigenvalues of the Sturm-Liouville system. It is the approximation of 
eigenvalues that we will consider in this chapter. A Sturm-Liouville application is discussed 
in Exercise 13 of Section 9.5. 


| a 9.1 Linear Algebra and Eigenvalues 


Eigenvalues and eigenvectors were introduced in Chapter 7 in connection with the conver- 
gence of iterative methods for approximating the solution to a linear system. To determine 
the eigenvalues of ann x n matrix A, we construct the characteristic polynomial 


p(a) = det(A — Al) 


and then determine its zeros. Finding the determinant of an n x n matrix is computationally 
expensive, and finding good approximations to the roots of p(A) is also difficult. In this 
chapter we will explore other means for approximating the eigenvalues of a matrix. In 
Section 9.6 we give an introduction to a technique for factoring a general m x n matrix into 
a form that has valuable applications in a number of areas. 

In Chapter 7 we found that an iterative technique for solving a linear system will 
converge if all the eigenvalues associated with the problem have magnitude less than 1. 
The exact values of the eigenvalues in this case are not of primary importance—only the 
region of the complex plane in which they lie. An important result in this regard was first 
discovered by S. A. GerSgorin. It is the subject of a very interesting book by Richard Varga. 
[Var2] 


Theorem 9.1 (GerSgorin Circle) 


Semyon Aranovich GerSgorin Let A be ann x n matrix and R; denote the circle in the complex plane with center a; and 
(1901-1933) worked at the radius ae jzi \aij|3 that is, 
Petrograd Technological Institute 
until 1930, when he moved to the 


n 


Leningrad Mechanical R; zEC||z—aj| < ‘2 |ai;| > 
Engineering Institute. His 1931 J=LJjAi 
paper Uber die Abgrenzung der 


where C denotes the complex plane. The eigenvalues of A are contained within the union of 
these circles, R = U?_,R;. Moreover, the union of any k of the circles that do not intersect 
the remaining (n — k) contains precisely & (counting multiplicities) of the eigenvalues. m 


Eigenwerte einer Matrix ({Ger]) 
included what is now known as 
his Circle Theorem. 


Proof Suppose that A is an eigenvalue of A with associated eigenvector x, where ||X||oo = I. 
Since Ax = Ax, the equivalent component representation is 


So ax; =Ax;, foreachi=1,2,...,n. (9.1) 
j=l 


Let k be an integer with |x;| = ||x|lo. = 1. When i = k, Eq. (9.1) implies that 


n 
) AjjXj = AX. 
j=l 


Thus 
n 
) AgXj = AXk = AnkXK = (A — Axk) Xk, 


j=l, 
tk 
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and 
n n 
JA = axl > |xel =| 9° anjxy] < Yo lalla: 
j=l, j=l, 
tk tk 
But |xx| = ||Xlloo = 1, 8o |x| < |x| = 1 for all j = 1,2,...,n. Hence 
n 
|A — aix| < Ss |x|. 
j=l, 
tk 
This proves the first assertion in the theorem, that 4 € R,. A proof of the second statement 
is contained in [Var2], p. 8, or in [Or2], p. 48. =. 6 


Example 1 Determine the GerSgorin circles for the matrix 


411 
As 0 oa |; 
=2 09° 4 


and use these to find bounds for the spectral radius of A. 


Solution The circles in the GerSgorin Theorem are (see Figure 9.1) 
Ri = {z€C | |z—4| < 2}, Ro={z€C| \z—2| < 1}, and R3 = {z€C | |z—9| < 2}. 


Because R, and R2 are disjoint from R3, there are precisely two eigenvalues within R; U R2 
and one within R3. Moreover, o(A) = maxj<;<3 |A;|, 80 7 < (A) < 11. | 


Figure 9.1 
Imaginary 
axis 
A 
Two eigenvalues One eigenvalue 


| _——+ > Real axis 
1 “a 5 (8 © JO al 


Even when we need to find the eigenvalues, many techniques for their approximation are 
iterative. Determining regions in which they lie is the first step for finding the approximation, 
because it provides us with an initial approximations. 

Before considering further results concerning eigenvalues and eigenvectors, we need 
some definitions and results from linear algebra. All the general results that will be needed 
in the remainder of this chapter are listed here for ease of reference. The proofs of many 
of the results that are not given are considered in the exercises, and all can be be found in 
most standard texts on linear algebra (see, for example, [ND], [Poo], or [DG]). 

The first definition parallels the definition for the linear independence of functions 
described in Section 8.2. In fact, much of what we will see in this section parallels material 
in Chapter 8. 
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Definition 9.2 


Theorem 9.3 


Definition 9.4 


Example 2 


Approximating Eigenvalues 


Let (vv, v®,...,v} be a set of vectors. The set is linearly independent if whenever 
0= ayy) + anv) + av eee av, 


then a; = 0, for each i = 0,1,...,k. Otherwise the set of vectors is linearly dependent. = 
Note that any set of vectors containing the zero vector is linearly dependent. 


Suppose that {v'?, vv), ..., v} is a set of n linearly independent vectors in R”. Then 
for any vector x € R” a unique collection of constants 6, B.,..., 8, exists with 


x = Biv + Bow + Bsv +... + Brv™. 7 
Proof Let A be the matrix whose columns are the vectors v“), v2,...,v. Then the set 
{v, v2), ...,v} is linearly independent if and only if the matrix equation 
A(@1,02,...,@,)' =0 has the unique solution (a,@2,...,a,)' = 0. 


But by Theorem 6.16 on page 397, this is equivalent to the matrix equation A(B), B2,..., 
Bn)’ = x, having a unique solution for any vector x € R”. This, in turn, is equivalent to the 
statement that for any vector x € R” a unique collection of constants 6), 62,..., By exists 
with 


x = Biv’? + Pov + Bav +--+ Bav™. oo 8 


Any collection of n linearly independent vectors in R” is called a basis for R”. a 


(a) Show that v = (1,0,0)', v™ = (—1, 1, 1)’, and v® = (0,4, 2)’ is a basis for R?, and 
(b) given an arbitrary vector x € R? find Bi, B2, and 63 with 


x = Biv" + Pov + pv. 


Solution (a) Let a, a, and a3 be numbers with 0 = ayv" + anv™ + a3v™. Then 
(0, 0,0)’ = a; (1,0, 0)’ + a (—1, 1, 1)’ + @3(0, 4, 2)' 
= (a1 — a2, a2 + 403, a2 + 203)’, 


sO Q@j)—@=0, a+4a3=0, and a.+2a3;=0. 
The only solution to this system is a} = a = a3 = 0, so this set {v"), v, v®} of 3 
linearly independent vectors in R? is a basis for R?. 

(b) Let x = (x1,.x2,x3)! be a vector in R*. Solving 


x= Biv iis Bov™ one B3v™ 
= B(1,0,0)' + B2(—1, 1, D! + B3((0,4, 2)’ 
= (Bi — B2, Bo + 483, Bo + 283)’ 
is equivalent to solving for 6;, 62, and 63 in the system 
Bi — Bo =%1, Bo +483 = x2, Bo + 2B3 = 23. 


This system has the unique solution 


1 
By = x1 —%2+2x3, By =2x3—x2, and p3= 7 02 — x3). a 
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The next result will be used in Section 9.3 to develop the Power method for approxi- 
mating eigenvalues. A proof of this result is considered in Exercise 10. 


Theorem 9.5 If A is a matrix and A),...,A, are distinct eigenvalues of A with associated eigenvectors 
x) xx, then {xx ...,x} is a linearly independent set. ] 


Example 3 Show that a basis can be formed for R? using the eigenvectors of the 3 x 3 matrix 


2 0 0 
A=| 1 1 2 
1 -1 4 


Solution In Example 2 of Section 7.2 we found that A has the characteristic polynomial 
p(a) = det(A — AD) = (A — 3)(A — 2)’. 


Hence there are two distinct eigenvalues of A: A; = 3 and Az = 2. In that example we 
also found that A; = 3 has the eigenvector x, = (0,1, 1)‘, and that there are two linearly 
independent eigenvectors x, = (0, 2, 1)‘ and x3 = (—2,0, 1)’ corresponding to A. = 2. 
It is not difficult to show (see Exercise 8) that this set of three eigenvectors 
{x1,X2,x3} = {(0, 1, 1)’, (0, 2, 1)’, (—2,0, 1)’} 


is linearly independent and hence forms a basis for R?. a 


In the next example we will see a matrix whose eigenvalues are the same as those in 
Example 3 but whose eigenvectors have a different character. 


Example 4 Show that no collection of eigenvectors of the 3 x 3 matrix 


2 
B=] 0 
0 


ONE 
wnono 


can form a basis for R?. 


Solution This matrix also has the same characteristic polynomial as the matrix A in 
Example 3: 


22%: 
p(a) = det Oo Be 
0 


Corr 


0 
0 |]=A=3)0 =o), 
a, 


so its eigenvalues are the same as those of A in Example 3, that is, 4; = 3 and Az = 2. 
To determine eigenvectors for B corresponding to the eigenvalue 1; = 3, we need to 
solve the system (B — 3/)x = 0, so 


0 x] -1 1 0 x} xX, + X2 
0) = (B = 31) X2 = 0 -1 O X2 = —XxX2 
0 X3 0 oO 0 x3 0, 


Hence x2 = 0, x} = x2 = O, and x3 is arbitrary. Setting x3 = | gives the only linearly 
independent eigenvector (0,0, 1)’ corresponding to A; = 3. 
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Definition 9.6 


Example 5 


Approximating Eigenvalues 


Consider A» = 2. If 


0 x] 0 1 0 xX] x2 
O |=(B-2A)| » |=] 0 0 O x2 |= 0 |, 
0 X3 0 0 1 X3 X35 


then x2 = 0, x3 = 0, and x is arbitrary. There is only one linearly independent eigenvector 
corresponding to 42 = 2, which can be expressed as (1, 0,0)’, even though A2 = 2 was a 
zero of multiplicity 2 of the characteristic polynomial of B. 

These two eigenvectors are clearly not sufficient to form a basis for R°. In particular, 
(0, 1,0)! is not a linear combination of {(0, 0, 1)’, (1,0, 0)’} . a 


We will see that when the number of linearly independent eigenvectors does not match 
the size of the matrix, as is the case in Example 4, there can be difficulties with the approx- 
imation methods for finding eigenvalues. 


Orthogonal Vectors 


In Section 8.2 we considered orthogonal and orthonormal sets of functions. Vectors with 
these properties are defined in a similar manner. 


A set of vectors {v"), v,..., v} is called orthogonal if (v)‘v = 0, for all i ¥ j. If, 
in addition, (v)'v = 1, for alli = 1,2,...,n, then the set is called orthonormal. 


Because x'x = ||x||> for any x in R", a set of orthogonal vectors {v‘), v,..., v} is 
orthonormal if and only if 


lV |l2=1, foreachi=1,2,...,n. 


(a) Show that the vectors v = (0,4, 2)', v™ = (—5,—1,2)', and v™ = (1, —1,2)! form 
an orthogonal set, and (b) use these to determine a set of orthonormal vectors. 

Solution (a) Wehave  (v"!))'v@ = 0(—5) + 4(—1) + 2(2) = 0, 
Wy'v = 00) + 4-1) +22) =0, and (wy = —5(1) — 1-1) +2@) =9, 


so the vectors are orthogonal, and form a basis for R”. The /, norms of these vectors are 


IW Ilo = 2V5, [IVP Jn = V30, and |v |p = V6. 
(b) The vectors 
a ¥ ( 0 4 2 ) F 2/5 4/5\ 
u —— = > > = > 2s > 
vi IIo 2/5 215 2/5 5 5 
wos v2) -(= i 3 ) V30 30 -V30\' 
~ IWw@I2 (30 ¥30' V307 46” 307 15 J’ 
ae = vw (1 -1 2) _[v6 ve v6 : 
~ |lw@Ip (V6 V6 6) \ 6’ 6° 3 


form an orthonormal set, since they inherit orthogonality from v,v®, and v°), and 
additionally, 


1 2 3 
Ju Io = [Ju Ilo = [Ju Ilo = 1. 
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The proof of the next result is considered in Exercise 9. 
Theorem 9.7. An orthogonal set of nonzero vectors is linearly independent. a 


The Gram-Schmidt process for constructing a set of polynomials that are orthogonal 
with respect to a given weight function was described in Theorem 8.7 of Section 8.2 (see 
page 515). There is a parallel process, also known as Gram-Schmidt, that permits us to 
construct an orthogonal basis for R” given a set of n linearly independent vectors in R”. 


Theorem 9.8 Let {x,,X2,...,X,} be a set of k linearly independent vectors in IR”. Then {v1, Vo,..., Ve} 


defined by 
vi =X, 
vi X2 
V2 =X) — (—— J v1, 
ViVi 
Vi X3 V5X3 
V3 = X3 | —— | vo, 
viv1 V5V2 
k-l > 4 
V;Xk 
Vk = Xk — ) r i 
: ViVi 
i=1 
is set of k orthogonal vectors in R”. a 


The proof of this theorem, discussed in Exercise 16, is a direct verification of the fact 
that foreach 1 <i<kand1 <j <k, withi 47, we have ViVi = 0. 

Note that when the original set of vectors forms a basis for R”, that is, when k = n, 
then the constructed vectors form an orthogonal basis for R”. From this we can form an 
orthonormal basis {u;, U2,...,u,} simply by defining for each i = 1,2,...,n 


Vi 


Ilvillo 


Uj 


The following example illustrates how an orthonormal basis for R* can be constructed from 
three linearly independent vectors in R>. 


Example 6 Use the Gram-Schmidt process to determine a set of orthogonal vectors from the linearly 
independent vectors 


S100) soo, and xP Said), 


Solution We have the orthogonal vectors v), v®), and v®), given by 
v) =x = (1,0,0)' 


((1, 0, 0)')‘(1, 1, 0)! 
(2) _ t 
la (ooo 


((1,0,0)')',1, DY ((0,1,0)',1, DY 
(3) __ t t t 
eae (Ti ocopytrocoy) 2 (Tororo) Ob 


= (1,1, 1)' — (1, 0,0)" — (0, 1,0)" = (0,0, 1)’. 


) (1,0,0)' = (1, 1,0)’ — 1,0, 0)’ = (0, 1, 0)’ 


The set {v‘), vv} happens to be orthonormal as well as orthogonal, but this is not 
commonly the situation. a 
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The LinearAlgebra package in Maple has a Gram-Schmidt command that returns an 
orthogonal set of vectors, or even an orthonormal set. These commands 


GramSchmidt({x1, x2, x3}) 
gives an orthogonal set of vectors, and the command 
GramSchmidt({x1, x2, x3}, normalized) 


produces the orthonormal set. 


EXERCISE SET 9.1 


1. 


Find the eigenvalues and associated eigenvectors of the following 3 x 3 matrices. Is there a set of 
linearly independent eigenvectors? 


2-3 6 2 0 41 
a A=] 0 3 -4 bp A=| 0 2 0 
0 2 +3 10 2 
111 od el 
ce A=| 1 1 0 d A=|02 1 
101 00 3 


Find the eigenvalues and associated eigenvectors of the following 3 x 3 matrices. Is there a set of 
linearly independent eigenvectors? 


1 0 0 2 —-1 -l 
a A=] -l 0 1 b A=] -l 2 -l 
-1 -l 2 -1 -l 2 
2 11 2 1 1 
«e A=|{ 1 2 1 d. A=| 0 3 1 
1. 2 0 0 2 


Use the GerSgorin Circle Theorem to determine bounds for the eigenvalues, and the spectral radius 
of the following matrices. 


1 0 0 4 -1 0 
a. =I 0 1 b. -1 4 -l 
-1 -1 2 -1 -l 4 
3.2 1 4.75 2.25  —0.25 
c 2 3 0 d. 2.25. 4.75 1.25 
1 0 3 —0.25 1.25 4.75 


Use the GerSgorin Circle Theorem to determine bounds for the eigenvalues, and the spectral radius 
of the following matrices. 


= GY 2 4 i. Wve 
0-4 2 a a a 
> i & =o % : Ot 8 <2 
3 1 O ai 10 1 ral 
ft! a: ot ot 
1201 =f # 1.9 
° ie a3 : G 4.4 3 
O13 2 ii @ 2 3 


For the matrices in Exercise | that have 3 linearly independent eigenvectors form the factorization 
A= PDP-'. 

For the matrices in Exercise 2 that have 3 linearly independent eigenvectors form the factorization 
A=PDP"'. 

Show that v; = (2,—1)', v2. = (1, 1)‘, and v3 = (1, 3)’ are linearly dependent. 
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Show that the three eigenvectors in Example 3 are linearly independent. 
Show that a set {v1,..., Vx} of k nonzero orthogonal vectors is linearly independent. 


Show that if A is a matrix and Aj, Ao, ..., Ay, are distinct eigenvalues with associated eigenvectors x), 
Xo, ..., Xx, then {x),X,...,X,} is a linearly independent set. 


Let {v,,...,V,} be a set of orthonormal nonzero vectors in R” and x € R”. Determine the values of 
cy, fork = 1,2,...,n, if 


n 
x= ) CkKYVk- 
k=1 


Assume that {x1, Xo}, {x1,X3}, and {X2, x3}, are all linearly independent. Must {x,, x2, x3} be linearly 
independent? 


Consider the follow sets of vectors. (i) Show that the set is linearly independent; (ii) use the Gram- 
Schmidt process to find a set of orthogonal vectors; (iii) determine a set of orthonormal vectors from 
the vectors in (ii). 


av, = (1,1), vo = (-2, 1) 

b. vy, = (1,1,0)’, v2 = (1,0, 1)’, v3 = (0,1, 1)’ 

e ov, = (1,1,1, 1)’, vo = (0,2, 2,2)’, v3 = (1,0,0, 1)’ 

d. v; = (2,2,3,2,3)', v2 = (2,—1,0,—1,0)’, v3 = (0,0,1,0,-1)', v4 = C,2,-1,0,-D’, 
vs = (0, 1,0, —1,0)' 

Consider the follow sets of vectors. (i) Show that the set is linearly independent; (ii) use the Gram- 

Schmidt process to find a set of orthogonal vectors; (iii) determine a set of orthonormal vectors from 

the vectors in (ii). 

a vy, = (2,-1)', v2 = (1,3) 

bh vy, = (2,-1,1)', v2 = (1,0, 1)’, v3 = (0, 2,0)’ 

ce v, = (1,1,1,1)', vo = (0,1, 1,1)’, v3 = (0,0, 1,0)’ 

d. v, = (2,2,0,2,1)’, vo = (—1,2,0,—-1, 1)’, v3 = (0, 1,0, 1,0)’, vs = (—1,0,0, 1, 1)’ 

Use the GerSgorin Circle Theorem to show that a strictly diagonally dominant matrix must be non- 

singular. 

Prove that the set of vectors {V,,V2,...,Vz} described in the Gram-Schmidt Theorem is 

orthogonal. 


A persymmetric matrix is a matrix that is symmetric about both diagonals; that is, an N x N matrix 
A = (ajj) is persymmetric if aj; = a = dy4i-in4i-;, for alli = 1,2,...,N andj = 1,2,...,N. 
A number of problems in communication theory have solutions that involve the eigenvalues and 
eigenvectors of matrices that are in persymmetric form. For example, the eigenvector corresponding 
to the minimal eigenvalue of the 4 x 4 persymmetric matrix 


2-1 0 0 ] 
-1 2 =] 0 
oe 0 -l 2 -1 
0 0 -!l 2 
gives the unit energy-channel impulse response for a given error sequence of length 2, and subsequently 


the minimum weight of any possible error sequence. 


a. Use the GerSgorin Circle Theorem to show that if A is the matrix given above and 4 is its minimal 
eigenvalue, then |A — 4| = p(A — 4/), where p denotes the spectral radius. 


b. Find the minimal eigenvalue of the matrix A by finding all the eigenvalues A — 4/ and computing 
its spectral radius. Then find the corresponding eigenvector. 


c. Use the GerSgorin Circle Theorem to show that if 4 is the minimal eigenvalue of the matrix 
3-1 -l 1 
-1 3-1 -l 


then |A — 6| = p(B — 61). 
d. Repeat part (b) using the matrix B and the result in part (c). 
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| Sa 9.2 Orthogonal Matrices and Similarity Transformations 


In this section we will consider the connection between sets of vectors and matrices formed 
using these vectors as their columns. We first consider some results about a class of special 
matrices. The terminology in the next definition follows from the fact that the columns of 
an orthogonal matrix will form an orthogonal set of vectors. 


Definition 9.9 A matrix Q is said to be orthogonal if its columns {qj,q5,... 


set in R”. 


It would probably be better to call 
orthogonal matrices orthonormal 
because the columns form not 
just an orthogonal but an 


orthonormal set of vectors. Exercise 16. 


Theorem 9.10 Suppose that Q is an orthogonal n x n matrix. Then 


(i) Q is invertible with Q-' = Q'; 
(ii) For any x and y in R", (Qx)'QOy = x'y; 


(iii) For any x in R”, ||Qx||2 = ||x||2. 


In addition, the converse of part (i) holds. (See Exercise 18.) That is, 


© any invertible matrix Q with Q~' = Q' is orthogonal. 


,q!,} form an orthonormal 


The Maple command /sOrthogonal(A) in the LinearAlgebra package returns true if A 
is orthogonal and false otherwise. 
The following important properties of orthogonal matrices are considered in 


As an example, the permutation matrices discussed in Section 6.5 have this property, so 


they are orthogonal. 


Property (iii) of Theorem 9.10 is often expressed by stating that orthogonal matrices 
are /y-norm preserving. As an immediate consequence of this property, every orthogonal 


matrix Q has ||Q||2 = 1. 


Example 1 Show that the matrix 


O= fu? a? uo] = 


_v30 v6 
6 6 
2/5 _ V¥30—__ V6 
30 6 
30 v6 
3 


formed from the orthonormal set of vectors found in Example 5 of Section 9.1 is an orthog- 


onal matrix. 


Solution Note that 


—/30 V6 
er ane a 
t 2/5 V30 V6 
Qo 5 30 6 
V5 30 v6 
5 I 3 


By Corollary 6.18 in Section 6.4 (see page 398) this is sufficient to ensure that Q* 


So Q is an orthogonal matrix. 


2/5 V5 

0 a ne 

Vm J V30 
6 30 15 
v6 _v6 v6 
6 6 3 


o* 


The next definition provides the basis for many of the techniques for determining the 


eigenvalues of a matrix. 
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Definition 9.11 |Twomatrices A and Bare said to be similar if a nonsingular matrix S exists withA = S~'BS. 
a 


An important feature of similar matrices is that they have the same eigenvalues. 


Theorem 9.12 Suppose A and B are similar matrices with A = S~'BS and A is an eigenvalue of A with 
associated eigenvector x. Then A is an eigenvalue of B with associated eigenvector Sx. & 


Proof Let x 4 0 be such that 
S-'BSx = Ax = Ax. 
Multiplying on the left by the matrix S gives 
BSx = 1Sx. 


Since x 4 0 and S is nonsingular, Sx 4 0. Hence, Sx is an eigenvector of B corresponding 
to its eigenvalue A. = = 8 


The Maple command JsSimilar(A, B) in the LinearAlgebra package returns true if A 
and B are similar and false otherwise. 

A particularly important use of similarity occurs when ann x n matrix A is similar to 
diagonal matrix. That is, when a diagonal matrix D and an invertible matrix S exists with 


A=S™'DS or equivalently D = SAS™!. 


In this case the matrix A is said to be diagonalizable. The following result is considered in 
Exercise 19. 


Theorem 9.13. Annxnmatrix A is similar to a diagonal matrix D if and only if A has n linearly independent 
eigenvectors. In this case, D = § —!AS, where the columns of S consist of the eigenvectors, 
and the ith diagonal element of D is the eigenvalue of A that corresponds to the ith column 
of S. | 


The pair of matrices S and D is not unique. For example, any reordering of the columns 
of S and corresponding reordering of the diagonal elements of D will give a distinct pair. 
See Exercise 15 for an illustration. 

We saw in Theorem 9.3 that the eigenvectors of a matrix that correspond to distinct 
eigenvalues form a linearly independent set. As a consequence we have the follow Corollary 
to Theorem 9.13. 


Corollary 9.14 Ann x n matrix A that has n distinct eigenvalues is similar to a diagonal matrix. a 


In fact, we do not need the similarity matrix to be diagonal for this concept to be useful. 
Suppose that A is similar to a triangular matrix B. The determination of eigenvalues is easy 
for a triangular matrix B, for in this case A is a solution to the equation 


0 = det(B— al) =| [i — A) 
i=l 


if and only if A = b,; for some i. The next result describes a relationship, called a similarity 
transformation, between arbitrary matrices and triangular matrices. 
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Theorem 9.15 


Issai Schur (1875-1941) is 
mainly known for his work in 
group theory but he also worked 
in number theory, analysis, and 
other areas. He published what is 
now known as Schur’s Theorem 
in 1909. 


The /, norm of a unitary matrix 
is 1. 


Theorem 9.16 


Corollary 9.17 


Approximating Eigenvalues 


(Schur) 


Let A be an arbitrary matrix. A nonsingular matrix U exists with the property that 
Per AU, 


where T is an upper-triangular matrix whose diagonal entries consist of the eigenvalues 
of A. a 


The matrix U whose existence is ensured in Theorem 9.15 satisfies the condition 
|| Ux||2 = ||x|l2 for any vector x. Matrices with this property are called unitary. Although 
we will not make use of this norm-preserving property, it does significantly increase the 
application of Schur’s Theorem. 

Theorem 9.15 is an existence theorem that ensures that the triangular matrix T exists, 
but it does not provide a constructive means for finding T, since it requires a knowledge of 
the eigenvalues of A. In most instances, the similarity transformation U is too difficult to 
determine. 

The following result for symmetric matrices reduces the complication, because in this 
case the transformation matrix is orthogonal. 


The n x n matrix A is symmetric if and only if there exists a diagonal matrix D and an 
orthogonal matrix Q with A = QDQ'. rT] 


Proof First suppose that A = QDQ’, where Q is orthogonal and D is diagonal. Then 
A’ = (QDQ')' = (0')' DO! = QDA' =A, 


and A is symmetric. 

To prove that every symmetric matrix A can be written in the form A = QDQ’, first 
consider the distinct eigenvalues of A. If Av; = A,Vv, and Avz = Avo, with A; A Az, then 
since A‘ = A we have 


(Ay — Aa)viVo = (A1v1)/ V2 — Vj (A2V2) = (Avi)'¥2 — Vi (AV2) = ViA'v2 — viAv2 = 0, 


SO Vi V2 = 0. Hence we can choose orthonormal vectors for distinct eigenvalues by simply 
normalizing all these orthogonal eigenvectors. When the eigenvalues are not distinct, there 
will be subspaces of eigenvectors for each of the multiple eigenvalues, and with the help 
of the Gram-Schmidt orthogonalization process we can find a full set of n orthonormal 
eigenvectors. 28 


The following corollaries to Theorem 9.16 demonstrate some of the interesting prop- 
erties of symmetric matrices. 


Suppose that A is a symmetric n x n matrix. There exist n eigenvectors of A that form an 
orthonormal set, and the eigenvalues of A are real numbers. a 


Proof If Q = (q;;) and D = (d;;) are the matrices specified in Theorem 9.16, then 
D=Q'AQ=Q7'AQ implies that AQ = QD. 
Let 1 < i < nand v; = (q1;, q2i,.-- ni)’ be the ith column of Q. Then 
Avy; = diVi, 


and dj; is an eigenvalue of A with eigenvector, v;, the ith column of Q. The columns of Q 
are orthonormal, so the eigenvectors of A are orthonormal. 
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A symmetric matrix whose 
eigenvalues are all nonnegative 
real numbers is sometimes called 
nonnegative definite (or positive 
semidefinite). 


Theorem 9.18 


9.2 Orthogonal Matrices and Similarity Transformations 573 


Multiplying this equation on the left by v; gives 
Vi AV; = diNiVi- 


Since vjAv; and vv; are real numbers and viv; = 1, the eigenvalue dj; = vjAv; is a real 
number, for each i = 1,2,...,n. ss 8 


Recall from Section 6.6 that a symmetric matrix A is called positive definite if for all 
nonzero vectors x we have x'Ax > 0. The following theorem characterizes positive definite 
matrices in terms of eigenvalues. This eigenvalue property makes positive definite matrices 
important in applications. 


A symmetric matrix A is positive definite if and only if all the eigenvalues of A are positive. 
o 


Proof First suppose that A is positive definite and that A is an eigenvalue of A with associated 
eigenvector x, with ||x||2 = 1. Then 


0 < x'Ax = Ax’x = Al[x|[} =A. 


To show the converse, suppose that A is symmetric with positive eigenvalues. By 
Corollary 9.17, A has n eigenvectors, vy) vy... v™) that form an orthonormal and, by 
Theorem 9.7, linearly independent set. Hence, for any x 4 0 there exists a unique set of 
nonzero constants 6, B2,..., By for which 


n 
x= zs Biv. 
i=l 


Multiplying by x‘A gives 


n n 


n n 
wAx=x'(S BAV! | =x' |S) Baw | = >>> BBA )'v. 
i=1 


= j=l i=l 
But the vectors v), v™,...,v form an orthonormal set, so 
F 0, if iA; 
(v/v = He a 
1, if i=). 
This, together with the fact that the A; are all positive, implies that 
n n n 
XAx= D> BBiA(V)'V = )° aif? > 0. 
j=l i=1 i=1 


Hence, A is positive definite. = = 8 


EXERCISE SET 9.2 


1. 


Show that the following pairs of matrices are not similar. 
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CHAPTER 9 


Approximating Eigenvalues 


1 2 1 1 2 0 
«e A=] 0 1 2 and B=] 0 1 2 
0 0 2 1 0 2 
1 2 1 1 2 1 
d. A=] -3 2 2 and B= 0 1 2 
0 1 2 =o; 2-2 


Show that the following pairs of matrices are not similar. 


1 1 2 2 
a. A=| 4 | and a=| 4 5 | 


1 1 1: 2 
b. ae al and B=| I > | 
1 1 -1 2 -2 0 
ce A=] —-l O 1 and B=| —2 0 2 
0 1 1 2 2 -—2 
1 1 -1 12 1 
d A= 2 -—2 2 and B=} 2 3 2 
-—3 3 3 0 1 0 


Define A = PDP! for the following matrices D and P. Determine A?. 
2 -l 1 0 
3 1 ae | 0 2 
-1 2 0 
b P= | 1 0 and D= | 1 Ls 


—2 
0 
1 2 -l 0 
«e P=] 2 1 0 and D=] 0 
1 0 


a P= 


0 

1 

2 0 0 
2 -l 2 0 0 
d. P=} -1 a and D=] 0 2 0 
0 -l 2 0 0 2 


Determine A‘ for the matrices in Exercise 3. 


For each of the following matrices determine if it diagonalizable and, if so, find P and D with 
A= PDP". 


4 -1 2 -1 
a. a=| if ‘| b. dea | | 


2 0 1 1 1 1 
ce A=] 0 1 0 d. A=] 1 1 0 
1 0 2 1 0 1 


For each of the following matrices determine if it diagonalizable and, if so, find P and D with 
A= PDP" 


2. i 2 1 
a. a=| 4 | b. a=| 4 4 
2 1 #1 2 1 1 
ce A= 1 2 1 da A= 0 3 #1 
1 Lb 2 0 0 2 


(i) Determine if the following matrices are positive definite, and if so, (ii) construct an orthogonal 
matrix Q for which Q‘AQ = D, where D is a diagonal matrix. 


2 1 1 2 
a. a=| 4 ll b. a=| 4 | 
20 1 1 1 
ce A=|]0 2 0 d A=] 1 1 0 
1 0 2 1 0 1 
(i) Determine if the following matrices are positive definite, and if so, (ii) construct an orthogonal 
matrix Q for which Q'AQ = D, where D is a diagonal matrix. 
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10. 


11. 


12. 
13. 
14. 


15. 


16. 
17. 
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421 321 
a A=|2 4 0 bh A=| 2 2 0 
104 [101 
i <i =f 4 r8421 
= ee a 4821 
AS ed ey BG ie ed oe a 
i —- o@ 4 1118 


Show that each of the following matrices is nonsingular but not diagonalizable. 


2 1 0 2. =3 6 
a A=] 0 2 0 b A=] 0 3. -4 
0 0 3 | 0 2 -3 
2 1 -1 fF ol 0 0 
ce A=] 0 2 1 d. A=} -l 0 1 
0 0 3 {| -1 —-1l 2 
Show that the following matrices are singular but are diagonalizable. 
2 -1 0 f 2 -1 -1 
a A=] -1 2 0 b A=] -l 2 -1 
0 0 0 [ -l1 -l 2 


In Exercise 31 of Section 6.6, a symmetric matrix 


159 1.69 2.13 
A=] 169 1.31 1.72 
2.13 1.72 1.85 


was used to describe the average wing lengths of fruit flies that were offspring resulting from the 
mating of three mutants of the flies. The entry a;; represents the average wing length of a fly that is 
the offspring of a male fly of type 7 and a female fly of type j. 


a. _ Find the eigenvalues and associated eigenvectors of this matrix. 

b. _ Is this matrix positive definite? 

Suppose that A and B are nonsingular n x n matrices. Prove the AB is similar to BA. 
Show that if A is similar to B and B is similar to C, then A is similar to C. 

Show that if A is similar to B, then 

a. det(A) = det(B). 

b. The characteristic polynomial of A is the same as the characteristic polynomial of B. 
c. A is nonsingular if and only if B is nonsingular. 

d. If A is nonsingular, show that A~! is similar to B™!. 

e. A’ is similar to B’. 

Show that the matrix given in Example 3 of Section 9.1, 


2 0 0 
A=]| 1 1 2 
1 -1 4 
is similar to the diagonal matrices 
3.0 0 2 0 0 2 0 0 
D,=|] 0 2 04], D=] 0 3 01), and DJ=} 0 2 O 
0 0 2 0 0 2 0 0 3 


Prove Theorem 9.10. 
Show that there is no diagonal matrix similar to the matrix given in Example 4 of Section 9.1, 


& 

ll 
oon 
oN eS 
woe 
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18. Prove that if Q is nonsingular matrix with Q' = Q-', then Q is orthogonal. 
19. Prove Theorem 9.13. 


| a 9.3 The Power Method 


The Power method is an iterative technique used to determine the dominant eigenvalue 
of a matrix—that is, the eigenvalue with the largest magnitude. By modifying the method 
slightly, it can also used to determine other eigenvalues. One useful feature of the Power 
method is that it produces not only an eigenvalue, but also an associated eigenvector. In fact, 
the Power method is often applied to find an eigenvector for an eigenvalue that is determined 
by some other means. 

To apply the Power method, we assume that the n x n matrix A has n eigenvalues 
iterations exaggerate the relative *1»42s+++»An With an associated collection of linearly independent eigenvectors {yD vy, 
size of the magnitudes of the v®,...,v}. Moreover, we assume that A has precisely one eigenvalue, 1, that is largest 
eigenvalues. in magnitude, so that 


The name for the Power method 
is derived from the fact that the 


[Ai] > |Ao| = |A3] = +--+ = [An] = 0. 


Example 4 of Section 9.1 illustrates that an n x n matrix need not have n linearly independent 
eigenvectors. When it does not the Power method may still be successful, but it is not 
guaranteed to be. 

If x is any vector in R”, the fact that {v, v™,v®,...,v} is linearly independent 
implies that constants 6), Bo,..., B, exist with 


n 
x= ~~ bv. 
j=l 


Multiplying both sides of this equation by A, A”,...,A*,... gives 


n 


n n n 
Aca) pa => Biv), Axa > sav => Bay, 
j=l j=l j=l j=l 
and generally, A‘x = Yo, Bay). 
If rH is factored from each term on the right side of the last equation, then 


n hi k 
A‘x = ae > Bi (+) vw), 
1 
j=l 


Since |A,| > |A;|, for all j = 2,3,...,n, we have limy 00 (Aj/A1)* = 0, and 


lim A‘x = lim Af B,v. (9.2) 
k-> oo k-0o 

The sequence in Eq. (9.2) converges to 0 if |A;| < 1 and diverges if |A,| > 1, provided, 
of course, that 6; 4 0. As a consequence, the entries in the A*x will grow with k if |A,| > 1 
and will go to 0 if |A,;| < 1, perhaps resulting in overflow or underflow. To take care of that 
possibility, we scale the powers of Ax in an appropriate manner to ensure that the limit in 
Eq. (9.2) is finite and nonzero. The scaling begins by choosing x to be a unit vector x 


relative to || - ||o and choosing a component x.) of x with 
(0), 4, (0) 
Xpo = 1 = [1X loo. 
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Let y? = Ax, and define ? = yl). Then 


( 
ane ~ bine he 2 Bite i Peis tS BQ Pen 


uO = yO — =e 
PO xO () 
Xpo Bi Upp’ at Dy. ' » Biv By Upp + Dj— » Bitsy 
Let p, be the least integer such that 
My) — yw 
lyr | = lly’ “loos 
and define x‘ by 
1 1 
pet 
_ =o! ao 
PI PI 
Then 
x) =] = Ix) 
x) = 1 = [xO] 
Now define 
1 
y? = Ax = Ax 
Ypi 
and 


(2) [6:a7 eh, uf] \/ 4 
) 
“PI [aang = pas 2 Bij Us | pe 


ty + ye 2 BQ; /)?uy 
Biv? + y =) BA; /Ar)uy? 


Let p2 be the smallest integer with 


2 2 
yop | = lly lleos 
and define 
1 1 1 
KO = ay = AK = ay AX”. 
Yp2 Spr Yp2 Ypy 


Ina similar manner, define sequences of vectors {x }©°_, and {y’”}®°_,, and a sequence 
of scalars {wu }°°_, inductively by 


. = Ax), 


yl) + pe 9A; /Aa)™Bivy_ 1 


(m) _,,(m) _ Voi 1 
id = YPm—1 ~ A 1 ; (9.3) 
it + Dj /A1"" Bip 
and (m) (0) 
m m 
go 
yo m 2 
Pm (k) 
TT 
k=1 


where at each step, p,, is used to represent the smallest integer for which 


D1 = lly loo. 
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By examining Eq. (9.3), we see that since |A;/A;|<1, for each j = 2,3,...,n, 
Limy—oo " = Ay, provided that x is chosen so that 6, 4 0. Moreover, the sequence of 
vectors {x‘”}°°_, converges to an eigenvector associated with A, that has /,. norm equal to 
one. 


Illustration The matrix 


—2 -3 
=| 5 | 
Has eigenvalues A; = 4 and A2 = 1 with corresponding eigenvectors v; = (1,—2)‘ and 


v2 = (1,—1)'. If we start with the arbitrary vector xp = (1, 1)‘ and multiply by the matrix 
A we obtain 


—5 —29 —125 
x) =Axy=| ral m =Ax =| ale %=An =| ae 
—509 —2045 —8189 
m= Ax =| ek ss =u =| ak = Axs= | ioc 
AS a consequence, approximations to the dominant eigenvalue A; = 4 are 
61 253 1021 
a) = — = 4.6923, A? = — = 4.14754, a? = —— = 4.03557, 
1 B 1 61 } 253 
4093 16381 
A = —— = 4.00881, a® = ——— = 4.00200. 
1021 4093 
. . : 6) _ 16381 ; 
An approximate eigenvector corresponding to Ay” = 7003 = 4.00200 is 
Xo = Bee , which, divided by 16381, normalizes to ae XV. 


The Power method has the disadvantage that it is unknown at the outset whether or not 
the matrix has a single dominant eigenvalue. Nor is it known how x should be chosen so 
as to ensure that its representation in terms of the eigenvectors of the matrix will contain a 
nonzero contribution from the eigenvector associated with the dominant eigenvalue, should 
it exist. 

Algorithm 9.1 implements the Power method. 


Power Method 


To approximate the dominant eigenvalue and an associated eigenvector of the n x n matrix 
A given a nonzero vector x: 


INPUT dimension n; matrix A; vector x; tolerance TOL; maximum number of iterations NV. 


OUTPUT approximate eigenvalue jw; approximate eigenvector x (with ||x||.. = 1) ora 
message that the maximum number of iterations was exceeded. 
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Step71 Setk=1. 
Step 2 Find the smallest integer p with 1 < p < n and |x,| = ||X||oo. 
Step 3 Setx =x/x). 
Step 4 While (k < N) do Steps 5-11. 
Step5 Sety=Ax. 
Step6 Setw=yyp. 
Step 7 Find the smallest integer p with 1 < p < n and |y,| = |lylloo- 


Step 8 Ify, =0 then OUTPUT (‘Eigenvector’, x); 
OUTPUT (‘A has the eigenvalue 0, select a new vector x and 
restart’); 
STOP. 


Step 9 Set ERR = ||x — (y/Yp)| loo; 


X= Y/Yp. 


Step 10 If ERR < TOL then OUTPUT (yu, x); 
(The procedure was successful.) 
STOP. 


Step 11 Setk=k+1. 


Step 12. OUTPUT (‘The maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. a 


Accelerating Convergence 


Choosing, in Step 7, the smallest integer p» for which |y”| = |ly |loo will generally 
ensure that this index eventually becomes invariant. The rate at which {uu }re_, converges 
to A, is determined by the ratios |A;/A,|"", for j = 2,3,...,n, and in particular by |Az/A,|". 
The rate of convergence is O(|A2/A,|”) (see [IK, p. 148]), so there is a constant k such that 


for large m, 
Ao |" 
™ ile kl — 
[Le 1 Ml? 
which implies that 
(wer — Ail [ag 
lim x 1. 
merce [uw — da] | 


The sequence {ju} converges linearly to A,, so Aitken’s A? procedure discussed in Section 
2.5 can be used to speed the convergence. Implementing the A? procedure in Algorithm 
9.1 is accomplished by modifying the algorithm as follows: 


Step 71 Setk=1; 
Mo = 0; 
fy = 0. 
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Step6 Setuw=y,; 


eats (1 = Ho)? 
M— 21+ Lo 
Step 10 If ERR < TOLandk > 4 then OUTPUT (j1,x); 
STOP. 
Step 11 Setk=k+1; 
Mo = 1; 
MIi=H. 


In actuality, it is not necessary for the matrix to have distinct eigenvalues for the Power 
method to converge. If the matrix has a unique dominant eigenvalue, 4;, with multiplicity r 
greater than 1 and v')), y®,...,v are linearly independent eigenvectors associated with 
A1, the procedure will still converge to 41. The sequence of vectors {x”}°°_, will, in this 
case, converge to an eigenvector of A, of /,., norm equal to one that depends on the choice of 
the initial vector x and is a linear combination of Vv, v,..., v. (See [Wil2], page 570.) 


Example 1 Use the Power method to approximate the dominant eigenvalue of the matrix 


-4 14 0 
A=| -5 13 0 |], 
“1 2 


and then apply Aitken’s A? method to the approximations to the eigenvalue of the matrix 
to accelerate the convergence. 


Solution This matrix has eigenvalues 4; = 6,A2 = 3, and A3 = 2, so the Power method 
described in Algorithm 9.1 will converge. Let x = (1,1, 1)’, then 


y? = Ax = (10,8, 1)’, 
SO 
y 
1 


Hy Ilo = 10, wp? =y?=10, and x” = a (1,0.8,0.1)'. 


Continuing in this manner leads to the values in Table 9.1, where i” represents 
the sequence generated by the Aitken’s A? procedure. An approximation to the dominant 


Table 9.1 m (x)! Tha po” 

0 d, 1, 

1 C, 0.8, 0.1) 10 6.266667 
2 (1, 0.75, —0.111) 7.2 6.062473 
3 (1, 0.730769, —0.188803) 6.5 6.015054 
4 (1, 0.722200, —0.220850) 6.230769 6.004202 
5 (1, 0.718182, —0.235915) 6.111000 6.000855 
6 (1, 0.716216, —0.243095) 6.054546 6.000240 
7 (1, 0.715247, —0.246588) 6.027027 6.000058 
8 (1, 0.714765, —0.248306) 6.013453 6.000017 
9 (1, 0.714525, —0.249157) 6.006711 6.000003 
10 C1, 0.714405, —0.249579) 6.003352 6.000000 
11 (1, 0.714346, —0.249790) 6.001675 
12 (1, 0.714316, —0.249895) 6.000837 
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eigenvalue, 6, at this stage is 4° = 6.000000. The approximate /,,-unit eigenvector for 
the eigenvalue 6 is (x")’ = (1,0.714316, —0.249895)’. 

Although the approximation to the eigenvalue is correct to the places listed, the eigen- 
vector approximation is considerably less accurate to the true eigenvector, (1,5/7, —1/4)' ~ 
(1, 0.714286, —0.25)’. rT] 


Symmetric Matrices 


When A is symmetric, a variation in the choice of the vectors x” and y™ and the 
scalars 4“ can be made to significantly improve the rate of convergence of the sequence 
{uw }°°_, to the dominant eigenvalue A). In fact, although the rate of convergence of the 
general Power method is O(|A2/A,|”), the rate of convergence of the modified procedure 
given in Algorithm 9.2 for symmetric matrices is O(|A2 Jal): (See [IK, pp. 149 ff].) 
Because the sequence {1} is still linearly convergent, Aitken’s A? procedure can also 
be applied. 


Symmetric Power Method 


To approximate the dominant eigenvalue and an associated eigenvector of the n x n sym- 
metric matrix A, given a nonzero vector x: 
INPUT dimension n; matrix A; vector x; tolerance TOL; maximum number of iterations NV. 


OUTPUT approximate eigenvalue j1; approximate eigenvector x (with ||x||2 = 1) ora 
message that the maximum number of iterations was exceeded. 


Step1 Setk=1; 
x = x/|[xIl2. 
Step 2. While (k < N) do Steps 3-8. 
Step 3 Sety = Ax. 
Step4 Setu=x'y. 


Step 5 If |ly||2 = 0, then OUTPUT (‘Eigenvector’, x); 
OUTPUT (‘A has eigenvalue 0, select new vector x 
and restart’ ); 


STOP. 
Step 6 Set ERR = |x — || : 
llyll2 Io 
x=y/llylle. 


Step 7 If ERR < TOL then OUTPUT (w,x); 
(The procedure was successful.) 
STOP. 


Step8 Setk=k+1. 


Step 9 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. | 
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Example 2 Apply both the Power method and the Symmetric Power method to the matrix 


using Aitken’s A? method to accelerate the convergence. 


A= 


4 -l 1 
-1 3 =2 
1 —2 3 


> 


Solution This matrix has eigenvalues A, = 6,A2 = 3, and A3 = 1. An eigenvector for the 
eigenvalue 6 is (1,—1, 1)’. Applying the Power method to this matrix with initial vector 


(1,0,0)' gives the values in Table 9.2. 


Table 9.2 
m (yy! por po” (x)! with x” |loo =1 
0 (1, 0, 0) 
1 (4,-1, 1) 4 (1, —0.25, 0.25) 
2 (4.5, —2.25, 2.25) 4.5 7 (1, —0.5, 0.5) 
3 (5, -3.5, 3.5) 5 6.2 (1,-0.7,0.7)  _ 
4 (5.4, -4.5, 4.5). ; 5.4 _ 6.047617 (1, —0.8333, 0.8333) 
5 (5.666, —5.1666, 5.1666) 5.666 6.011767 (1, —0.911765, 0.911765) 
6 (5.823529, —5.558824, 5.558824) 5.823529 6.002931 (1, —0.954545, 0.954545) 
7 (5.909091, —5.772727, 5.772727) 5.909091 6.000733 (1, —0.976923, 0.976923) 
8 (5.953846, —5.884615, 5.884615) 5.953846 6.000184 (1, —0.988372, 0.988372) 
9 (5.976744, —5.941861, 5.941861) 5.976744 (1, —0.994163, 0.994163) 
10 (5.988327, —5.970817, 5.970817) 5.988327 (1, —0.997076, 0.997076) 
We will now apply the Symmetric Power method to this matrix with the same initial 
vector (1,0, 0)‘. The first steps are 
x? = 1,00, 4x = 4,-1,D', 0% =4, 
and 
x) = —___. Ax = (0.942809, —0.235702, 0.235702)’. 
||Ax||2 
The remaining entries are shown in Table 9.3. 
Table 9.3 
m (yy wo a” (x)! with |x [I> = 1 
0 (1, 0, 0) (1, 0, 0) 
1 (4, -1, 1) 4 7 (0.942809, —0.235702, 0.235702) 
2 (4.242641, —2.121320, 2.121320 5 6.047619 (0.816497, —0.408248, 0.408248) 
3 (4.082483, —2.857738, 2.857738) 5.666667 6.002932 (0.710669, —0.497468, 0.497468) 
4 (3.837613, —3.198011, 3.198011) 5.909091 6.000183 (0.646997, —0.539164, 0.539164) 
5 (3.666314, —3.342816, 3.342816) 5.976744 6.000012 (0.612836, —0.558763, 0.558763) 
6 (3.568871, —3.406650, 3.406650) 5.994152 6.000000 (0.595247, —0.568190, 0.568190) 
7 (3.517370, —3.436200, 3.436200) 5.998536 6.000000 (0.586336, —0.572805, 0.572805) 
8 (3.490952, —3.450359, 3.450359) 5.999634 (0.581852, —0.575086, 0.575086) 
9 (3.477580, —3.457283, 3.457283) 5.999908 (0.579603, —0.576220, 0.576220) 
10 (3.470854, —3.460706, 3.460706) 5.999977 (0.578477, —0.576786, 0.576786) 
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The Symmetric Power method gives considerably faster convergence for this matrix 
than the Power method. The eigenvector approximations in the Power method converge to 
(1,—1, 1)‘, a vector with unit /.,.-norm. In the Symmetric Power method, the convergence 
is to the parallel vector (/3/3, —/3/3, V3/3)', which has unit /)-norm. 

If A is areal number that approximates an eigenvalue of a symmetric matrix A and x is 
an associated approximate eigenvector, then Ax — Ax is approximately the zero vector. The 
following theorem relates the norm of this vector to the accuracy of A to the eigenvalue. 


Theorem 9.19 Suppose that A is ann x n symmetric matrix with eigenvalues A, A2,..., An. If we have 
||Ax — Ax||2 < ¢ for some real number A and vector x with ||x||2 = 1, then 


min |A; —A| <e. a 
l<j<n 


Proof Suppose that v), v™,...,v form an orthonormal set of eigenvectors of A asso- 
ciated, respectively, with the eigenvalues 41, A2,...,An. By Theorems 9.5 and 9.3, x can be 


expressed, for some unique set of constants 61, B2,..., Bn, as 
n 
= j) 
x=) By! 
j=l 
Thus 


2 

n n n 

2 j 2 2 + 2 2 

Ax — axilla = | D a = 7 , IBjl? aj — AP? = min [ay — 21? D _{ . 
JF 2 FF j= 


But 


n 
2 2 7 
S16? = IIxl3=1, soe > JAx—Axllp > min [a;—Al. 
1 l<j<n 
j= 


Inverse Power Method 


The Inverse Power method is a modification of the Power method that gives faster con- 

vergence. It is used to determine the eigenvalue of A that is closest to a specified number q. 
Suppose the matrix A has eigenvalues A,..., A, with linearly independent eigenvectors 

v),...,v. The eigenvalues of (A — gI)~!, where g # ij, fori = 1,2,...,n, are 


1 1 1 
M—-q@ Arg An -4@ 


with these same eigenvectors yD, vw, (See Exercise 15 of Section 7.2.) 
Applying the Power method to (A — gI)~! gives 


y” = (A = gly ix), 


1 
n (Gi) 
(m) viz Bj m Um 
Ce a = oe (Aj — 9) (0.4) 
He = Join xen — = I a ; 
a ae PGS a gynt tem 
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and ies 
ee le 
Yom 
where, at each step, pm represents the smallest integer for which |y~”| = |ly” ||oo. The 
sequence {w")} in Eq. (9.4) converges to 1/(A, — q), where 
1 1 
——_ = max ———_,, 
[Ak — ql Isisn [Ai — ql 
and A, * q + 1/u™ is the eigenvalue of A closest to q. 
With k known, Eq. (9.4) can be written as 
(i) 
1 Beopy?  + ie 16) [354 a)" Vm—1 
p™ = re JF — . (9.5) 
k 
~e Bev, Pinna + Yofar By | HE 4] ‘y®, 


JAK 
Thus, the choice of g determines the convergence, provided that 1/(A, — q) is a unique 


dominant eigenvalue of (A — q/)~! (although it may be a multiple eigenvalue). The closer 
q is to an eigenvalue Ax, the faster the convergence since the convergence is of order 


aA—q! |" a 
(la=ol )-°(la=o!) 
Ak — q)~ (A— q) 
where A represents the eigenvalue of A that is second closest to q. 
The vector y is obtained by solving the linear system 


(A _ qhoy™ = xan 


In general, Gaussian elimination with pivoting is used, but as in the case of the LU factor- 

ization, the multipliers can be saved to reduce the computation. The selection of g can be 

based on the GerSgorin Circle Theorem or on another means of localizing an eigenvalue. 
Algorithm 9.3 computes g from an initial approximation to the eigenvector x by 


OLAX 
1 = ORO” 


This choice of g results from the observation that if x is an eigenvector of A with respect to 
the eigenvalue A, then Ax = Ax. So x‘Ax = Ax’x and 


x'Ax = x'Ax 


xx [x3 


If g is close to an eigenvalue, the convergence will be quite rapid, but a pivoting technique 
should be used in Step 6 to avoid contamination by round-off error. 

Algorithm 9.3 is often used to approximate an eigenvector when an approximate eigen- 
value g is known. 
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Inverse Power Method 


To approximate an eigenvalue and an associated eigenvector of the n x n matrix A given a 
nonzero vector x: 


INPUT dimension n; matrix A; vector x; tolerance TOL; maximum number of iterations NV. 


OUTPUT approximate eigenvalue 1; approximate eigenvector x (with ||x||,, = 1) ora 
message that the maximum number of iterations was exceeded. 

x’Ax 

xx | 


Step2 Setk=1. 


Step 1 Setg= 


Step 3 Find the smallest integer p with 1 < p < n and [x,| = [|X|loo. 
Step 4 Setx =x/xp. 
Step 5 While (k < N) do Steps 6-12. 

Step 6 Solve the linear system (A — gl)y = x. 


Step 7 If the system does not have a unique solution, then 
OUTPUT (‘gq is an eigenvalue’, gq); 
STOP. 


Step 8 Setw=yyp. 
Step 9 Find the smallest integer p with 1 < p < n and [y,| = [lYlloo- 
Step 10 Set ERR = |x — (y/yp)|| ,; 


X= Y/Yp. 


Step 11 If ERR < TOL then set w = (1/u) + q; 
OUTPUT (uw, x); 
(The procedure was successful.) 
STOP. 


Step 12 Setk=k+1. 


Step 13. OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. a 


The convergence of the Inverse Power method is linear, so Aitken A? method can again 
be used to speed convergence. The following example illustrates the fast convergence of 
the Inverse Power method if g is close to an eigenvalue. 


Example 3 Apply the Inverse Power method with x = (1, 1, 1)‘ to the matrix 


a ae xXOAX® 19 
A=| -5 13 0 wih ¢=—_p@ =a: 
-1 02 xOr"~X 3 


and use Aitken’s A? method to accelerate the convergence. 


Solution The Power method was applied to this matrix in Example | using the initial vector 
x = (1,1, 1)!. It gave the approximate eigenvalue j“!?) = 6.000837 and eigenvector 
(x@))" = (1, 0.714316, —0.249895)'. 
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For the Inverse Power method we consider 


-i 14 0 
20 
-1 0 -4 


With x = (1, 1, 1)‘, the method first finds y" by solving (A — ql)y = x. This gives 


33 24 84)\' ana 
| ) = (—6.6, —4.8, 1.292307692)'. 


5° 5° 65 
So 
1 
Hy Ilo = 6.6, x) = re = (1, 0.7272727, —0.1958042)', 
and 
1 19 
d) — ___4 = — 61818182. 
66°53 


Subsequent results are listed in Table 9.4, and the right column lists the results of Aitken’s 
A? method applied to the jz. These are clearly superior results to those obtained with the 


Power method. a 
m xt por po 

0 d, 1, 1) 

1 (1, 0.7272727, —0.1958042) 6.1818182 6.000098 

2 (1, 0.7155172, —0.2450520) 6.0172414 6.000001 

3 (1, 0.7144082, —0.2495224) 6.0017153 6.000000 

4 (1, 0.7142980, —0.2499534) 6.0001714 6.000000 

5 (1, 0.7142869, —0.2499954) 6.0000171 

6 (1, 0.7142858, —0.2499996) 6.0000017 


If A is symmetric, then for any real number gq, the matrix (A — g/)~' is also symmetric, 
so the Symmetric Power method, Algorithm 9.2, can be applied to (A — qi)! to speed the 
convergence to 


Deflation Methods 


Numerous techniques are available for obtaining approximations to the other eigenvalues 
of a matrix once an approximation to the dominant eigenvalue has been computed. We will 
restrict our presentation to deflation techniques. 

Deflation techniques involve forming a new matrix B whose eigenvalues are the same 
as those of A, except that the dominant eigenvalue of A is replaced by the eigenvalue 0 in 
B. The following result justifies the procedure. The proof of this theorem can be found in 
[Wil2], p. 596. 
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Theorem 9.20 


Helmut Wielandt (1910-2001) 
originally worked in permutation 
groups, but during during World 
War II he was engaged in research 
on meteorology, cryptology, and 
aerodynamics. This involved 
vibration problems that required 
the estimation of eigenvalues 
associated with differential 
equations and matrices. 


Example 4 


9.3 The Power Method 587 
Suppose A;,A2,...,A, are eigenvalues of A with associated eigenvectors v), v,...,v” 
and that 4, has multiplicity 1. Let x be a vector with x‘v) = 1. Then the matrix 
B=A-—dA,v")x! 


has eigenvalues 0,42,43,...,4, With associated eigenvectors v, w?, w®),...,w™, 
where v and w are related by the equation 


vO = (Ay — A) W? + AL (XW, (9.6) 


for each i = 2,3,...,n. a 


There are many choices of the vector x that could be used in Theorem 9.20. Wielandt 
deflation proceeds from defining 


1 
qd) (di1, G2, - = ond (9.7) 
10; 
where vy) is a nonzero coordinate of the eigenvector v", and the values aj1, dj2, . . . , din are 
the entries in the ith row of A. 
With this definition, 
1 1 n 
1 1 1 1 1 
fy) = — [tasty i Oly s vg so OY = Y aijus i 
7) 7) j 
vj MU; j= 


where the sum is the ith coordinate of the product Av. Since AV = A,v")), we have 


n 
1 1 
J aiju.? = rv, 
j=l 


which implies that 


1 
x'v) = —_ (avy?) = 1. 
Aqv; 


So x satisfies the hypotheses of Theorem 9.20. Moreover (see Exercise 20), the ith row of 
B=A— i,v"x' consists entirely of zero entries. 

If A A Ois an eigenvalue with associated eigenvector w, the relation Bw = Aw implies 
that the ith coordinate of w must also be zero. Consequently the ith column of the matrix 
B makes no contribution to the product Bw = Aw. Thus, the matrix B can be replaced by 
an (1 — 1) x (n — 1) matrix B’ obtained by deleting the ith row and column from B. The 
matrix B’ has eigenvalues A2,A3,...,An- 

If |A2| > |A3|, the Power method is reapplied to the matrix B’ to determine this new 
dominant eigenvalue and an eigenvector, w)’, associated with 45, with respect to the matrix 
B’. To find the associated eigenvector w”) for the matrix B, insert a zero coordinate between 
the coordinates wo and we 
v) by the use of Eq. (9.6). 


of the (n — 1)-dimensional vector w and then calculate 


The matrix 
4 -1 1 
A=]| -l 3-2 
1 -—2 3 
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has the dominant eigenvalue A; = 6 with associated unit eigenvector vO) = (1,-1, 1’. 
Assume that this dominant eigenvalue is known and apply deflation to approximate the 
other eigenvalues and eigenvectors. 


Solution The procedure for obtaining a second eigenvalue A. proceeds as follows: 


4 


1 2°. 4 TY 
» ae -1 = Al 2h oe > 
6 1 3° 6'6 
2 1 1 
1 5 -s. & 
(1) _ 2 1 1 _ 2 1 1 
weary ml il as See 1 a ag: es 
1 2 1 1 
3 6 6 
and 
2 1 1 
=f 4 , =— & 0 0 0 
B=A-),vx' = 1 3 -2 |-6} -2 24 -} |= 3 2-1 
1 —2 3 2 _1 1 —3 -1 2 
3 6 6 


Deleting the first row and column gives 


, f 2 -1 
lege) 


which has eigenvalues A. = 3 and A3 = 1. For A, = 3, the eigenvector w°)’ can be obtained 
by solving the linear system 


(B’ —31Dw® =0,  resultingin w? = (1,—D). 


Adding a zero for the first component gives w® = (0, 1, —1)' and, from Eq. (9.6), we have 
the eigenvector v™ of A corresponding to x. = 3: 


vy? = (Ag — Ay)w? + Ay (x'w yw) 


2 141 
3° 6'6 


= (33 —6)(0,1,-1)' +6 [ ) (0, 1, v'| d,—-1,1)' = (-2,-1,1). 


Although this deflation process can be used to find approximations to all of the eigen- 
values and eigenvectors of a matrix, the process is susceptible to round-off error. After 
deflation is used to approximate an eigenvalue of a matrix, the approximation should be 
used as a starting value for the Inverse Power method applied to the original matrix. This 
will ensure convergence to an eigenvalue of the original matrix, not to one of the reduced 
matrix, which likely contains errors. When all the eigenvalues of a matrix are required, 
techniques considered in Section 9.5, based on similarity transformations, should be used. 

We close this section with Algorithm 9.4, which calculates the second most dominant 
eigenvalue and associated eigenvector for a matrix, once the dominant eigenvalue and 
associated eigenvector have been determined. 
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Wielandt Deflation 


To approximate the second most dominant eigenvalue and an associated eigenvector of the 
n X n matrix A given an approximation A to the dominant eigenvalue, an approximation v 
to a corresponding eigenvector, and a vector x € R”"!: 


INPUT dimension n; matrix A; approximate eigenvalue A with eigenvector v € IR”; vector 
x € R*! tolerance TOL, maximum number of iterations N. 


OUTPUT approximate eigenvalue jz; approximate eigenvector u or a message that the 
method fails. 


Step 1 Let i be the smallest integer with 1 < i < n and |v;| = max)<;<, |v;|. 


Step 2 Ifi# 1 then 
fork =1,...,i-—1 
forj=1,...,i-1 
set by = Ay — EE ibe. 
Ui 
Step 3 IfiAlandi#¢n then 
fork =i,...,n—1 
forj=1,...,i-—1 


Vet 
set by = Aki — ——Gij; 
Ui 
ais 
j 
Dik = Gjkai — 5 ditt 


Step 4 IfiAnthen 
fork =i,...,n—1 
forj =i,...,n—1 
Uk+1 


set by = akyij41 — ra oe 
i 


Step 5 Perform the power method on the (n — 1) x (n — 1) matrix B’ = (b,;) with x as 
initial approximation. 


Step 6 If the method fails, then OUTPUT (‘Method fails’); 
STOP 
else let wz be the approximate eigenvalue and 
w = (w},..., w)_,)’ the approximate eigenvector. 


Step 7 IfiA1thenfork=1,...,i—1 set we = wy. 
Step 8 Set w; = 0. 
Step9 IfiAnthenfork =i+1,...,nset wz, = wy_,. 
Step 10 Fork=1,...,n 

Uk 


n 
set Up = (UU — A)wE + Yo aijw; ae 


j=l : 


(Compute the eigenvector using Eq. (9.6).) 


Step 11. OUTPUT (u,u); (The procedure was successful.) 
STOP. | 
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EXERCISE SET 9.3 


1. 


Find the first three iterations obtained by the Power method applied to the following matrices. 


> i 4 a | 
a ee a b. 1104; 
i ad 2 ae a 
Use x = (1,—1,2)'. Use x = (—1,0, 1) 
1-1 O A 1% 24 
c = ae ee iL eet th 
0 -1 2 1 1 -1 2 0 | 
Use x = (—1,2, 1). oe 


Use x = (1, —2,0,3)'. 


Find the first three iterations obtained by the Power method applied to the following matrices. 


4 2 1 1 1 0 0 
a 0 3 2 |; b 12 0 1 
1 1 4 : 0 0 3 3)’ 
Use x® = (1,2, 1) 0 1 3 2 
Use x = (1, 1,0, 1’. 
ao 4 re 
2 5 3) 1 9 9 2} 
Lp bas) Lota] 
3 —4 —2 5 0 1 1 4 


Use x = (1, 1,0, —3)'. 


Repeat Exercise | using the Inverse Power method. 


Use x = (0,0,0, 1). 


Repeat Exercise 2 using the Inverse Power method. 


Find the first three iterations obtained by the Symmetric Power method applied to the following 
matrices. 


2 1 1 1 1 1 
a. 1 2 1 |; b. 1 1 Of; 
11 2 1 0 1 
Use x® = (1,—1,2)'. Use x = (-1,0, 1)’. 
4.75 2.25  —0.25 4 1 -1l O 
c. 2.25 4.75 1.25 |; 1 3 -l1 0O 
—0.25 1.25 4.75 


a: ay =f 3S oi 
6 0 2 4 
Use x = (0, 1,0, 0)’. 


Find the first three iterations obtained by the Symmetric Power method applied to the following 
matrices. 


Use x = (0, 1,0). 


—2 1 3 4 2 -1 
a. 1 3 -1 |; b. 2 0O 2 : 
3 -l 2 -1 2 0O 
Use x = (1,—1,2)!. Use x = (—1,0, 1)’. 
| 4 1 11 ] 5 —2 -; 3 
1 3-1 1 |. 2 5 By 
- Lh, Oe d. i ao ep 
1 1 0 2 ~2 2 ~ 
3 -1 2 § 
Use x® = (1,0,0, 0)’. 2 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 
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Use the Power method to approximate the most dominant eigenvalue of the matrices in Exercise 1. 
Iterate until a tolerance of 10~* is achieved or until the number of iterations exceeds 25. 

Use the Power method to approximate the most dominant eigenvalue of the matrices in Exercise 2. 
Iterate until a tolerance of 10~* is achieved or until the number of iterations exceeds 25. 

Use the Inverse Power method to approximate the most dominant eigenvalue of the matrices 
in Exercise 1. Iterate until a tolerance of 10~* is achieved or until the number of iterations 
exceeds 25. 

Use the Inverse Power method to approximate the most dominant eigenvalue of the matrices in 
Exercise 2. Iterate until a tolerance of 10~* is achieved or until the number of iterations exceeds 25. 


a 


se the Symmetric Power method to approximate the most dominant eigenvalue of the matri- 
es in Exercise 5. Iterate until a tolerance of 10~* is achieved or until the number of iterations 
xceeds 25. 


og a 


(om 


se the Symmetric Power method to approximate the most dominant eigenvalue of the matri- 
es in Exercise 6. Iterate until a tolerance of 10~* is achieved or until the number of iterations 
xceeds 25. 


og a 


(ome 


se Wielandt deflation and the results of Exercise 7 to approximate the second most dominant eigen- 
alue of the matrices in Exercise 1. Iterate until a tolerance of 10~* is achieved or until the number of 
iterations exceeds 25. 


< 


Use Wielandt deflation and the results of Exercise 8 to approximate the second most dominant eigen- 
value of the matrices in Exercise 2. Iterate until a tolerance of 10~* is achieved or until the number of 
iterations exceeds 25. 


Repeat Exercise 7 using Aitken’s A? technique and the Power method for the most dominant eigen- 
value. 


Repeat Exercise 8 using Aitken’s A? technique and the Power method for the most dominant eigen- 
value. 


Hotelling Deflation Assume that the largest eigenvalue A, in magnitude and an associated eigen- 
vector v) have been obtained for the n x n symmetric matrix A. Show that the matrix 


a 
= Dy(Lyt 
B=A (yyy (v~’) 


has the same eigenvalues A>,...,A,, as A, except that B has eigenvalue 0 with eigenvector v‘” instead 
of eigenvector 4,. Use this deflation method to find 42 for each matrix in Exercise 5. Theoretically, 
this method can be continued to find more eigenvalues, but round-off error soon makes the effort 
worthless. 


Annihilation Technique Suppose the n x n matrix A has eigenvalues 4 ,...,A, ordered by 
lAi] > JAg| > [a3] 2 +++ 2 lanl, 


with linearly independent eigenvectors vv, ...,v™. 


a. Show that if the Power method is applied with an initial vector x© given by 
xO = Boy + Bsv +--+ Biv, 


then the sequence {”} described in Algorithm 9.1 will converge to A. 
b. Show that for any vector x = )~y_, Bjv, the vector x = (A — A,J)x satisfies the property 
given in part (a). 
c. Obtain an approximation to 4, for the matrices in Exercise 1. 
d. Show that this method can be continued to find A3 using x = (A — AxI)(A — A, Dx. 
Following along the line of Exercise 11 in Section 6.3 and Exercise 15 in Section 7.2, suppose that a 
species of beetle has a life span of 4 years, and that a female in the first year has a survival rate of 5, in 


the second year a survival rate of ie and in the third year a survival rate of i. Suppose additionally that 
a female gives birth, on the average, to two new females in the third year and to four new females in 
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the fourth year. The matrix describing a single female’s contribution in | year to the female population 
in the succeeding year is 


of OO 
2e-OON 
of 
————"] 


Co Ono 


0 |? 
0 
where again the entry in the ith row and jth column denotes the probabilistic contribution that a female 


of age j makes on the next year’s female population of age i. 


a. Use the GerSgorin Circle Theorem to determine a region in the complex plane containing all the 
eigenvalues of A. 


b. Use the Power method to determine the dominant eigenvalue of the matrix and its associated 
eigenvector. 


c. Use Algorithm 9.4 to determine any remaining eigenvalues and eigenvectors of A. 
d. Find the eigenvalues of A by using the characteristic polynomial of A and Newton’s method. 
e. What is your long-range prediction for the population of these beetles? 


Show that the ith row of B = A— A, vx! is zero, where A is the largest value of A in absolute value, 
v‘) is the associated eigenvector of A for A,, and x is the vector defined in Eq. (9.7). 


The (m — 1) x (m — 1) tridiagonal matrix 


[ 1+2e  —-a Dicceinhe 0 
—a I+2a -a | 


—— | 


—a 


L Ov shine tak a) “*e@ 1420 

is involved in the Backward Difference method to solve the heat equation. (See Section 12.2.) For 
the stability of the method we need p(A~!) < 1.With m = 11, approximate o(A7') for each of the 
following. 

a a= 7 b a= 
When is the method stable? 


The eigenvalues of the matrix A in Exercise 21 are 


1 
2 


». Fi 2 
iy = 14a (sin 2) fori=1,...,m—1. 
2m 


Compare the approximation in Exercise 21 to the actual value of p(A~!). Again, when is the method 
stable? 


The (m — 1) x (m— 1) matrices A and B given by 


+ 
g 
Nig 
S 
ro) 


Ita -¢% Ope sree ee eee 0 


- — 
at 
LR 
wie 


| Ovstendarece ns 0 -¢ i en ee 0 x wea 
are involved in the Crank-Nicolson method to solve the heat equation (see Section 12.2). With m = 11, 
approximate o(A~'B) for each of the following. 
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— 1 yl = 3 
a a=%4 b. a=5 GQ a=F 


24. A linear dynamical system can be represented by the equations 


dx 

a A(t)x(t) + Bu), = -y(t) = C()x(@t) + Du), 
where A is ann X n variable matrix, B is an n x r variable matrix, C is an m x n variable matrix, 
D is an m x r variable matrix, x is an n-dimensional vector variable, y is an m-dimensional vector 
variable, and u is an r-dimensional vector variable. For the system to be stable, the matrix A must 
have all its eigenvalues with nonpositive real part for all t. Is the system stable if 


a b. 
1 2 0 2 a. 8 
A(t)=| -25 -7 4 |? A(t) = OG face Ee 
0 0 —5 


| 9.4 Householder's Method 


In Section 9.5 we will use the QR method to reduce a symmetric tridiagonal matrix to 
a similar matrix that is nearly diagonal. The diagonal entries of the reduced matrix are 
approximations to the eigenvalues of the given matrix. In this section, we present a method 
devised by Alston Householder for reducing an arbitrary symmetric matrix to a similar 
tridiagonal matrix. Although there is a clear connection between the problems we are solving 
in these two sections, Householder’s method has a such wide application in areas other than 
eigenvalue approximation, that it deserves special treatment. 

Householder’s method is used to find a symmetric tridiagonal matrix B that is similar to 
a given symmetric matrix A. Theorem 9.16 implies that A is similar to a diagonal matrix D 


Alston Householder (1904-1993) 
did research in mathematical 
biology before becoming the 


Director of the Oak Ridge since an orthogonal matrix Q exists with the property that D = Q~'AQ = Q'AQ. Because 
National Laboratory in Tennessee _ the matrix Q (and consequently D) is generally difficult to compute, Householder’s method 
in 1948. He began work on offers a compromise. After Householder’s method has been implemented, efficient methods 
solving linear systems in the such as the QR algorithm can be used for accurate approximation of the eigenvalues of the 
1950s, which was when these resulting symmetric tridiagonal matrix. 


methods were developed. 


Householder Transformations 
Definition 9.21 Let w € R” with ww = 1. Then x n matrix 
P=I1-—2ww' 


is called a Householder transformation. | 


Householder transformations are used to selectively zero out blocks of entries in vectors 
or columns of matrices in a manner that is extremely stable with respect to round-off error. 
(See [Wil2], pp. 152-162, for further discussion.) Properties of Householder transformations 
are given in the following theorem. 


Theorem 9.22 A Householder transformation, P = I — 2ww’, is symmetric and orthogonal, so 
Pe, = 


Proof It follows from 


(ww')' = (w')'w' — ww’, 
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that 
P' = (I — 2ww')' = 1 — 2ww' = P. 
Further, w'w = 1, so 
PP’ = (I — 2ww’)( — 2ww’) = I — 2ww' — 2ww' + 4ww' ww’ 
=1— 4ww' + 4ww’ = 7, 
ad Pe PP, _- 8 


Householder’s method begins by determining a transformation P“ with the property 
that A?) = PAP“ zero’s out the entries in the first column of A beginning with the third 
row. That is, such that 


ay) =0, foreachj = 3,4,...,n. (9.8) 


By symmetry, we also have ay? = 0. 
We now choose a vector w = (Ww), W2,..., W,)! so that w’w = 1, Eq. (9.8) holds, and 
in the matrix 


A® = PYAP = (I — 2ww')A(I — 2ww’), 
we have af = ay and ay = 0, for each j = 3,4,...,n. This choice imposes n conditions 
on the n unknowns wy, W2,..., Wy 


Setting w; = 0 ensures that a® = a,,. We want 

P — I — 2ww' 
to satisfy 

Pay}, €21,431,---,@n1)' = (ayy, a,0,...,0), (9.9) 
where a will be chosen later. To simplify notation, let 
W = (wo, w3,...,Wn)) ER" |, = ¥ = (a, 431,...,4n1)' € R™|, 

and P be the (n — 1) x (n— 1) Householder transformation 
A At 


p= n—-1 — 2WW. 


Eq. (9.9) then becomes 


ai [= Ores 0 aii 
: oe ee ae eee ee ia e 
eee ee og Ly, ee |e 
P 
ani 0 0 
with 
P¥ = (1,1 — 20W')¥ = ¥ — 20W')W = (a, 0,..., 0)". (9.10) 


Let r = W’y. Then 


(a, 0, nays ,0)' = (a21 _ 2rw2, a3) — 2rw3, sey nl — 2rwn)’, 
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and we can determine all of the w; once we know a@ and r. Equating components gives 


a = ay, — 2rw 


and 
0=aj; —2rw;, for eachj = 3,...,n. 
Thus 
2rw2 = d21 —a@ (9.11) 
and 
2rw; =aj, foreachj =3,...,n. (9.12) 


Squaring both sides of each of the equations and adding corresponding terms gives 
Ar? >» we = (a — a)? + > Gi. 
j=2 j=3 
Since w'w = 1 and w, = 0, we have YS we = |, and 
4r? = ya — 2aay, + a. (9.13) 
j=2 


Equation (9.10) and the fact that P is orthogonal imply that 
a? = (a,0,...,0)(a,0,...,0)' = (PH) PF = H' PIPY = §'9. 


Thus 
n 
Qo. 2 
ac = ) Gj, 
j=2 


which, when substituted into Eq. (9.13), gives 


n 
2r’ = ) Gi, — a2}. 
= 


To ensure that 2r? = 0 only if a2} = a3; = --- = Gy; = 0, we choose 


1/2 
n 


a = —sgn(az1) Soap j 


j=2 


which implies that 


1/2 
2 = Yai + lal | Dain 
j=2 j=2 
With this choice of a and 2r?, we solve Egs. (9.11) and (9.12) to obtain 
WwW, = a and w;= Gil for each j = 3,...,n. 
r 2r 
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To summarize the choice of P“, we have 


1/2 


n 
2 
a = —sgn(a2) doa : 
j=2 


1 1 1/2 
r= (52° = a) > 


W= 0, 
a2; —@ 
ia 2r ° 
and 
a: 
wj = ean for each j = 3,...,n. 
2r 
With this choice, 
a ye Dy aon 20 
(2) (2) Q2 (2) 
49, 499, 493 Dy 
A® = PYOAPY — 0 as a7 ee ae 
He i 
(2) (2) 2 
0 Gg U3 Lee 
Having found P“ and computed A®, the process is repeated for k = 2,3,..., 
n — 2s follows: 
1/2 
n 
(k k)y2 
a= —sgn(ays x) » Gy) , 
j=k+1 
1/2 
r ee ree 
2 2 k+1,k ? 
oP mu? =...<uf =o, 
(k) 
wy” Antik — & 
k+1 or 2 
a® 
wy = a foreach j=k+2,k+3,...,n, 
: 


Pp® = Iw) . (w®)!, 
and 


A&TD — pHA® po 
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where 
(k+1) (k+1) 
a. 0 One iaiadigeetohee eiieceusse ts 0 
(kt) "+. : 
ay). 
0. " tise Tee Qaeeeeeeees 0 
es te bi ED ED (EN) 
Be = : Tee Tek Get iett Se 1 k+2 Akan 
0 
(k+1) ut 
Oeeeea te diewasts 0 Wile Ree alk 


Continuing in this manner, the tridiagonal and symmetric matrix A“~!) is formed, 
where 


7 \ pea? pa-3 = - PY Ap a pad pr-2) | 


Example 1 Apply Householder transformations to the symmetric 4 x 4 matrix 


4 1 -2 2 
1 2 0 1 
= —2 0 3 +2 
2 1 -2 -!1 


to produce a symmetric tridiagonal matrix that is similar to A. 
Solution For the first application of a Householder transformation, 


1/2 


4 F 1 ‘ 1 1/2 ke 
a = —(1) di r (5 ie: )) 
» v6 _ v6 v6)’ 
w= a a a > 
3 6 66 
i 0.00 wee 0 
0 1 0 0 6 2 
dd) _ = Eid F a 
PO | Ga a 2|— 7 (0,2, —1, 1) 
0 0 0 1 
1 0 0 0 
1 2 2 
ep a 
= 2 2 of 
0 3 3 3 
2 1 2 
0-3 3 3 
and 
4 -3 0 0 
10 1 4 
AM = : 5 ; 
Be gn Sg 
0 ¢ -% -1 
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Continuing to the second iteration, 


pV = 


UwUlk OO 


oore 
| 
UR UILO © 


and the symmetric tridiagonal matrix is 


4 -3 0 O 
5 
3 


G3) 3 > 3 0 
3 25 75 
68 149 
OB FF 


Algorithm 9.5 performs Householder’s method as described here, although the actual 
matrix multiplications are circumvented. 


Householder’s 


To obtain a symmetric tridiagonal matrix A“’—!) similar to the symmetric matrix A = A“, 
construct the following matrices A®,A®,...,A°~), where AW = (aj) for each k = 
1,2,...,.n—1: 


INPUT dimension n; matrix A. 
OUTPUT A”). (Ateach step, A can be overwritten.) 
Step 1 Fork =1,2,...,n—2 do Steps 2-14. 

Step 2 Set 


Step 3 Ia, iy = 0 then set a = —q!/? 
1/2) 


qk 
else set a = — ler ae 


lara 
Step 4 Set RSQ =a? —aa{’),,. (Note: RSQ = 2r) 
Step 5 Sety,=0; (Note: vy; =--- = vg_; = 0, but are not needed.) 
ar eee 
k+l = App yp — Os 
Forj=k+2,...,nset vj = | 


1 1 
(Wore w= (sas) = =") 


; 1 “. 
Step6 Forj=k,k+1,...,nsetuj = (as) ~ ay Uj. 
RSQ imk+1 
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1 1 1 
Note: u= (| —~ ) Ay = —A“y = —A”w. 
RSQ 2r2 r 


Step 7 Set PROD= 5° vin. 
i=k+1 


1 
Note: PROD = V'u = —-v'A“y. 
2r2 


: PROD 
Step 8 Forj = kk + ty... snsetgy =u — (TP) wp 
Not yf 1 
-Z=>uUu—- — Vuv=u—- ——yVuvV 
ve 2RSO 4r2 


1 1 
=u— ww = —A“ w — ww’ Aw.) 
r r 


Step9 Forl=k+1,k+2,...,n— 1 do Steps 10 and 11. 
(Note: Compute A&+) = A® — yz! — zy! 
= (I — 2ww')A“ (I — 2ww’).) 
Step 10 Forj=/1+1,...,nset 
(k+1) (k) 


ai = Gi — UIZG — Vj{ZI5 
(kK+I) __k+1) 
ai; = ai 7 


Step 11 Set aft? = a — 2u,z). 


Step 12 Set a) = a® — Qunzp. 


Step 13 Forj=k+2,...,nset a” = ay = 0. 


k+1 k 
Step 14 Set rate = ee = Ups 123 


(k+1) _ _(k+1) 
Be = Ut k- 


(Note: The other elements of A“+") are the same as A“ .) 


Step 15 OUTPUT (A-); 
(The process is complete. A") is symmetric, tridiagonal, and similar to A.) 
STOP. - 


Householder’s method can be implemented in Maple with the LinearAlgebra package. 
For the matrix in Example | we would do the following. 
with(LinearAlgebra): A := Matrix({[4, 1, —2, 2], [1, 2, 0, 1], [—2, 0, 3, —2], [2, 1, —2, -1])) 
Then an orthogonal matrix Q and a tridiagonal matrix T with A = QTQ’' are found using 
QO := TridiagonalForm(A, output =' Q'); T := TridiagonalForm(A, output =' T') 
The matrices produced by Maple are the 10-digit approximations to 


1 O 0 0 es 0 0 

0 —03 0.13 —0.93 —3 33 -016 O 
C=! o 06 -06 -03 and T=) 9 _016 -132 0.906 

0 -—06 0.73 0.13 0 0 0.906 1.986 


In the next section, we will examine how the QR algorithm can be applied to determine 
the eigenvalues of A). which are the same as those of the original matrix A. 

Householder’s Algorithm can be applied to an arbitrary n x n matrix, but modifications 
must be made to account for a possible lack of symmetry. The resulting matrix A”~) will 
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not be tridiagonal unless the original matrix A is symmetric, but all the entries below the 
lower subdiagonal will be 0. A matrix of this type is called upper Hessenberg. That is, 
H = (h;;) is upper Hessenberg if h;; = 0, for alli > j + 2. 

The required modifications for arbitrary matrices are: 


1 n 


Step 6 ae aT, ay vi; 
i=k+1 
1 n 
(k) 
Y= peg Da tH % 
RSQ ixk+1 
: PROD 
Step 8 Forj=1,2,...,nset z= uj — so 


Step9 Forl=k+1,k+2,...,ndo Steps 10 and 11. 


Step 10 Forj=1,2,...,k set ak) — gl — ZULs 


jl jl 
(k+l) (® ; 
aj = ay; — yj. 


Step 11 Forj=k+1,...,nset la — Me — Zvl — yivj. 


After these steps are modified, delete Steps 12 through 14 and output A“), 


EXERCISE SET 9.4 


1. Use Householder’s method to place the following matrices in tridiagonal form. 


12 10 4 2 -l1 -l 
a. 10 8 —5 b. -1 2 -l 
4 -5 3 -l1 -1l 2 
1 1 4.75 2.25  —0.25 
c. 1 1 0 d. 2.25 4.75 1.25 
0 1 —0.25 1.25 4.75 
2. Use Householder’s method to place the following matrices in tridiagonal form. 
4 -1 -1 0 5 -2 -05 ed 
-1 4 0 -!il —2 = 15  —0.5 
a b. 
-1 0 4 -1 —0.5 1.5 5 —2 
0 -1l -l 4 15  -0.5 —2 5 | 
8 0.25 0.5 2 -1 
0.25 —4 0 1 2 
c 0.5 0 3 0.75 -1 
2 1 0.75 ) —0.5 
—1 2 -1 —0.5 6 
2 =1 <1 0 0 
| -1 0 -—2 O ] 
d. —l 0 4 2 1 | 
0 -2 2 8 3 
0 0 1 3 5 | 
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601 


3. Modify Householder’s Algorithm 9.5 to compute similar upper Hessenberg matrices for the following 


nonsymmetric matrices. 


a =i 3 = 
a. o° 0 4 b. 2 
ao a 3 
a 4 

@ ab 2 ef zi 

= i 2 -2% 9 =i 
-1 4 0 3 = 


2 3 

3 =2 

? 21 

ei. i: 4 
a os | 

= ae oe | 

4 of 2 


| a 9.5 The OR Algorithm 


The deflation methods discussed in Section 9.3 are not generally suitable for calculating 
all the eigenvalues of a matrix because of the growth of round-off error. In this section we 
consider the QR Algorithm, a matrix reduction technique used to simultaneously determine 


all the eigenvalues of a symmetric matrix. 


To apply the QR method, we begin with a symmetric matrix in tridiagonal form; that 
is, the only nonzero entries in the matrix lie either on the diagonal or on the subdiagonals 
directly above or below the diagonal. If this is not the form of the symmetric matrix, the first 
step is to apply Householder’s method to compute a symmetric, tridiagonal matrix similar 


to the given matrix. 


In the remainder of this section it will be assumed that the symmetric matrix for which 
these eigenvalues are to be calculated is tridiagonal. If we let A denote a matrix of this type, 
we can simplify the notation somewhat by labeling the entries of A as follows: 


ay by Oe 
by an bs, 

A=] 0. b3, a3. 
Osteeeiess 0 By 


an 


(9.14) 


If by = Oorb, = 0, then the | x 1 matrix [a] or [a,,] immediately produces an eigenvalue a; 
or a, of A. The QR method takes advantage of this observation by successively decreasing 


the values of the entries below the main diagonal until bz ~ 0 or b, ~ 0. 


When b; = 0 for some j, where 2 < j < n, the problem can be reduced to considering, 


instead of A, the smaller matrices 


ee 0 & Ge Vexses 0 

by a2 bs Dizi Ajri Dj+2 ; s 

0. ds as 0 | and |. Bas, aaa 0 |. @.5) 
i, Dji-4 es i 7 bh 

Pivcedeces 0 By bere) | ee 0 ay ee 


If none of the b; are zero, the QR method proceeds by forming a sequence of matrices 


A=A,A® A®.., as follows: 


1. A” = A is factored as a product A? = QR, where Q is orthogonal and 


R‘ is upper triangular. 
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Definition 9.23 


If A is the 2 x 2 rotation matrix 
ims - 6 —sin@é 
sin@ cos @ 


then Ax is x rotated counter- 
clockwise by the angle 0. 


Example 1 


These are often called Givens 
rotations because they were used 
by James Wallace Givens 
(1910-1993) in the 1950s when 
he was at Argonne National 
Laboratories. 


Approximating Eigenvalues 


2. A is defined as A® = RYVOM, 


In general, A“ is factored as a product A® = QR® of an orthogonal matrix Q and 
an upper triangular matrix R®. Then A“*? is defined by the product of R® and Q® in the 
reverse direction A“+!) = R®Q®, Since Q is orthogonal, R® = Q'A and 


AG) = ROQO = (oA) Q® = a A%Q9, (9.16) 


This ensures that A“+! is symmetric with the same eigenvalues as A”. By the manner in 
which we define R® and Q™, we also ensure that A“ is tridagonal. 

Continuing by induction, A“*) has the same eigenvalues as the original matrix A, and 
A“* tends to a diagonal matrix with the eigenvalues of A along the diagonal. 


Rotation Matrices 


To describe the construction of the factoring matrices Q and R, we need the notion of a 
rotation matrix. 


A rotation matrix P differs from the identity matrix in at most four elements. These four 
elements are of the form 


Pi = pj =cos@ and pj; = —pji = sing, 


for some 6 and some i Fj. a 


It is easy to show (see Exercise 8) that, for any rotation matrix P, the matrix AP differs 
from A only in the ith and jth columns and the matrix PA differs from A only in the ith and 
jth rows. For any i ¥ j, the angle 6 can be chosen so that the product PA has a zero entry 
for (PA);;. In addition, every rotation matrix P is orthogonal, because the definition implies 
that PP’ = I. 


Find a rotation matrix P with the property that PA has a zero entry in the second row and 
first column, where 


> 
ll 
orw 
Re We 
Wwe © 


Solution The form of P is 


cos @ sind O 3cos0+sin0 cos@é+3sin@ sind 
PS —sind coséd 0 so PA= —3sin0+cos@ —sinO+3cos@ cosé 
0 0 1 0 1 3 


1 
The angle @ is chosen so that —3 sin @ + cos 6 = 0, that is, so that tan@ = 3" Hence 


. 10 
cos @ = ——. sin@ = —— 
10 10 
and 
3V10 ‘10 3 1 
to 0 3 4 V10 2710 iV10 
= V10 = 3710 = 4)/ 3 
PA —~ + 0 oe 0 <zVv10 Wv10 
0 Oo 1 0 1 3 
Note that the resulting matrix is neither symmetric nor tridiagonal. a 
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The factorization of A™ into A? = QMR™ uses a product of n — 1 rotation matrices 
to construct 


(1) nf n-1"** > ( ) sc 
We first choose the rotation matrix P> with 


Pil = P22 = COs oo) and Pi2 = —P21 = sin Oo, 


where 


. by ay 
sin 02 = ——— and cos@)= 


b> + at b> +a7 
This choice gives 
—ba\ abo 


jes dhee 


for the entry in the (2, 1) position, that is, in the second row and first column of the product 
P,A™, So the matrix 


0. 


(— sin 62)a, + (cos 62)b2 = 


Ay? = P,A® 


has a zero in the (2, 1) position. 

The multiplication P,A“ affects both rows 1 and 2 of A“, so the matrix ay does 
not necessarily retain zero entries in positions (1,3), (1,4),..., and (1,7). However, A is 
tridiagonal, so the (1,4),..., (1,7) entries of A? must also be 0. Only the (1, 3)-entry, the 


one in the first row and third column, can become nonzero in A. 
In general, the matrix P; is chosen so that the (k, k — 1) entry in AY = PAY, is zero. 


This results in the (k — 1,k + 1)-entry becoming nonzero. The matrix ay has the form 


a es i | 0 
0. a? 
OO Ze-4 “G4 kA 
A = OO XxX Yk 0 6, 
~ Oe a4 bisa 0 
a 
i “Dy 
(es law ante vnade waa Hatehaas aces 8 0 ps Gn 
and P,4; has the form 
Tk O O 
Ck+i Sk+1 <rowk 
Pri = O O (9.17) 
—Sk+1 Ck+ 
O O Tn-k-1 
t 
column k 
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where 0 denotes the appropriately dimensional matrix with all zero entries. 

The constants Ch = cos O41 and spi; = sin O41 in Px+1 are chosen so that the 
(k + 1,k)-entry in ve is zero; that is, —Sp41X% + Cra de41 = 0. 

Since ee ait S 41 = 1, the solution to this equation is 


Skp1 = a and Cy = a 
bi + %% bis + X% 
and An has the form 
Fa Tc EE 0 
0. 
“Om ge on 
A oe O xXet1 Yeti 0.7 

oe De+2 Ak42 x43 ee 0 
i, a Dy, 
Geoetse eect ete eee 80 eb. ae 


Proceeding with this construction in the sequence P2,..., P, produces the upper trian- 
gular matrix 


a1. rh Oeieaeedaced 0 

0. : 

RY = A = o 
Tn-2 
<n-1 dnt 

OeeadPateends eenases ng) Xn 


The other half of the QR factorization is the matrix 


Qo = Pi P%, ... pt 


n? 


because the orthogonality of the rotation matrices implies that 
OORY = (PhPy PL) + (Pao PaP2) AP =A. 
The matrix Q™ is orthogonal because 
(QV )'Q® = (PpP3 +++ Ph)! (PaP3 +++ Ph) = (Pn ++ P3P2) + (P2P3 °° P,) = 1. 


In addition, Q”) is an upper-Hessenberg matrix. To see why this is true, you can follow the 
steps in Exercises 9 and 10. 

As aconsequence, A?) = RQ”) is also an upper-Hessenberg matrix, because multi- 
plying QO on the left by the upper triangular matrix R“ does not affect the entries in the 
lower triangle. We already know that it is symmetric, so A is tridiagonal. 

The entries off the diagonal of A® will generally be smaller in magnitude than the 
corresponding entries of A“, so A® is closer to being a diagonal matrix than is A“). The 
process is repeated to construct A®’, A®, ... until satisfactory convergence is obtained. 
(See [Wil2], pages 516-523.) 
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Example 2 Apply one iteration of the QR method to the matrix that was given in Example 1: 
3 1 0 
A=] 13 1 
0 1 3 
Solution Let A“ = A be the given matrix and P) represent the rotation matrix determined 


in Example 1. We found, using the notation introduced in the QR method, that 


3/10 10 
~ w 29 


10 cae a v10 3v10 <P 
AD = PAY =| _v0 sv 9 13 1}/=| 9 4@ 3yi0 
2 10 10 013 5 10 
0 0 1 0 1 3 
Zt ry 
— 0 X2 y2 
0 BD 
Continuing, we have 
by 0.36761 and ca 2 93058 
3 =U. an B= =U. : 
ot O Ja t+ 
so 
0 J10 3710 Ye 
R® = AM = P3A”? = s ee 0.36761 0 64 ay 
—0.36761 0.92998 ji 4 : 
10 
10 
-| esi a 1.9851 |, 
2.4412 
and 
a =o O lft 0 0 
OY =P;P;=| YO 3710 0 0.92998 —0.36761 
0 0 4 0 0.36761 0.92998 
0.94868 —0.29409 0.11625 
=| 0.31623 0.88226 —0.34874 
0 0.36761 0.92998 
Asa consequence, 
V0 2/10. 22 0.94868 —0.29409 0.11625 
A? =RQM =| 9g 9.7903 1.9851 0.31623 0.88226 —0.34874 
0 0 2.4412 0 0.36761 0.92998 


3.6 0.86024 0 
0.86024 3.12973 0.89740 
0 0.89740 2.27027 
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The off-diagonal elements of A are smaller than those of A" by about 14%, so we have 
a reduction but it is not substantial. To decrease to below 0.001 we would need to perform 
13 iterations of the QR method. Doing this gives 


4.4139 0.01941 0 
A“) — | 0.01941 3.0003 0.00095 
0 0.00095 1.5858 


This would give an approximate eigenvalue of 1.5858 and the remaining eigenvalues could 
be approximated by considering the reduced matrix 


4.4139 0.01941 
0.01941 3.0003 


Accelerating Convergence 


If the eigenvalues of A have distinct moduli with |A;| > |A2| > --- > |Aj|, then the rate of 
convergence of the entry ba to 0 in the matrix A“*) depends on the ratio |A i+1/Aj| (see 


[Fr]). The rate of convergence of ai to 0 determines the rate at which the entry ee 


converges to the jth eigenvalue A;. Thus, the rate of convergence can be slow if |Aj+1/A,;| is 
not significantly less than 1. 

To accelerate this convergence, a shifting technique is employed similar to that used 
with the Inverse Power method in Section 9.3. A constant o is selected close to an eigenvalue 
of A. This modifies the factorization in Eq. (9.16) to choosing Q and R® so that 


A® — o1 = QOR®, (9.18) 
and, correspondingly, the matrix A“+” is defined to be 
APN = ROO” + ef. (9.19) 


With this modification, the rate of convergence of Ba” to 0 depends on the ratio |(Aj+1 — 
o)/(Aj — o)|. This can result in a significant improvement over the original rate of conver- 


( 


gence of ae to A; if o is close to Aj; but not close to Aj. 


We change o at each step so that when A has eigenvalues of distinct modulus, b“*) 
converges to 0 faster than bir) for any integer j less than n. When b‘*? is sufficiently 
small, we assume that 4, *~ at, delete the nth row and column of the matrix, and 
proceed in the same manner to find an approximation to 4,_;. The process is continued 
until an approximation has been determined for each eigenvalue. 

The shifting technique chooses, at the ith step, the shifting constant 0;, where o; is the 
eigenvalue of the matrix 


@ j 
E® = ay-| BY 
b® a 


that is closest to a. This shift translates the eigenvalues of A by a factor o;. With this 
shifting technique, the convergence is usually cubic. (See [WR], p. 270.) The method ac- 
cumulates these shifts until b¢+) ~ 0 and then adds the shifts to a+) to approximate the 
eigenvalue A,,. 

If A has eigenvalues of the same modulus, ye? may tend to 0 for some j 4 n ata 
faster rate than b+. In this case, the matrix-splitting technique described in (9.14) can be 
employed to reduce the problem to one involving a pair of matrices of reduced order. 
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Example 3 Incorporate shifting into the QR method for the matrix 


a? © 9 
bs” as” bY 


3 1 
A=|1 3 
0 1 0 pi) a) 


Wr © 
| 


Solution To find the acceleration parameter for shifting requires finding the eigenvalues of 


ay) bs -[? | 


1 @® JO LI 3 
which are 4; = 4 and jz = 2. The choice of eigenvalue closest to a = 3 is arbitrary, and 
we choose 42 = 2 and shift by this amount. Then o; = 2 and 
a by) 0 i. TQ 
bY de BP S| Pot 
0 pe 7 0 1 1 
Continuing the computation gives 
p) V2 
x =1, y=, z= v2, oe 52:23 
2 2 
V2 V2 
gn = v2, xX, = 0, Y=, and J2. S73 
2 2 
so 
W242 2 
1 
AP=| 0 0 2 
0 1 1 
Further, 
2 
2= 1, c3 = 0, 3 1, nQ= 1, and es 
so 
V2 J2 #2 
R® = A 0 1 1 
V2 
0 0 -+ 
To compute A”, we have 
/2 /2 J2 
3 9” an = 2. oe => y? a — 1, a => “A? and a = 0, 
so 
2, 22 
(2) bya yen) v2 v2 
le s+ =a 
0 -2 oO 
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One iteration of the QR method is complete. Neither pb?) = /2/2 nor Ee = —/2/2 is 
small, so another iteration of the QR method is performed. For this iteration, we calculate 
the eigenvalues 5 + 5 3 of the matrix 


2 PT pf. -2 
Be ay? me 0 > 


and choose 07 = 4 - 5V3, the closest eigenvalue to a = 0. Completing the calculations 


gives 
2.6720277 —0.37597448 0 
A® = | 0.37597448 — 1.4736080 0.030396964 
0 0.030396964 —0.047559530 


If aS = 0.030396964 is sufficiently small, then the approximation to the eigenvalue 13 is 
1.5864151, the sum of a3) and the shifts 0; + 02 = 2+ (1 — V3)/2. Deleting the third 
row and column gives 


AO = 2.6720277 0.37597448 
~ | 0.37597448  1.4736080 |’ 


which has eigenvalues 4; = 2.7802140 and 2 = 1.3654218. Adding the shifts gives the 
approximations 
Ai © 4.4141886 and A. © 2.9993964. 


The actual eigenvalues of the matrix A are 4.41420, 3.00000, and 1.58579, so the QR method 
gave four significant digits of accuracy in only two iterations. a 


Algorithm 9.6 implements the QR method with shifting incorporated to accelerate 


convergence. 
QR 
To obtain the eigenvalues of the symmetric, tridiagonal n x n matrix 
a” B (eee 0 
Be dq) 
A=A,= 0 0 
| a 
0 ‘ex ivadas 0 ' Eo at 


INPUT 7; ay, east’ be ...,b™ tolerance TOL; maximum number of iterations M. 


24 9? 


OUTPUT eigenvalues of A, or recommended splitting of A, or a message that the maxi- 
mum number of iterations was exceeded. 
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Step 7 Setk=1; 
SHIFT =0. (Accumulated shift.) 
Step 2. While k < M do Steps 3-19. 

(Steps 3-7 test for success.) 

Step 3 If \b®| < TOL then set A = a“) + SHIFT; 
OUTPUT (A); 
setn=n-—1l. 

Step 4 If ia | < TOL then set A = a + SHIFT, 


OUTPUT (A); 
setn=n—1; 


Step 5 Ifn=Othen 
STOP. 
Step 6 Ifn=1 then 


set 4 = a\” + SHIFT, 
OUTPUT (A); 
STOP. 
Step 7 Forj =3,...,n—1 
if |b | < TOL then 
OUTPUT (‘split into’, ay, Whe 3 i ese ae 


‘and’ : 
Cit De eke e SHIFT), 
STOP. 


Step 8 (Compute shift.) 
Set b = =(0 +a); 
c= aban, — [LPT 
d = (b* — 4c)”. 
Step 9 Ifb> Othen set uw) = —2c/(b+d); 


My = —(b+ d)/2 
else set uw, = (d — b)/2; 
M2 = 2c/(d — b). 


Step 10 Ifn=2then set A, = uw, + SHIFT; 
Ay = [2 + SHIFT: 
OUTPUT (Aq, i2)3 
STOP. 
Step 11 Choose a so that |o — a | = min{|1 — a |, |. — a |}. 
Step 12 (Accumulate the shift.) 
Set SHIFT = SHIFT +0. 
Step 13 (Perform shift.) 
For j = 1,...,n, set dj =a" —o. 
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Step 14 (Steps 14 and 15 compute R”.) 
Set x} = d); 
yi = bo. 
Step 15 Forj=2,...,n 


sae 1/2 
set 7-1 = a + [4 d : 


Bat 
G= i ; 
Zj-1 
bi? 
aoa 
Zi 
Gj-1 = Gjyj-1 + Sid); 
Xj = —ORYj-1 + GG; “ 
Ifj An then set rj) = ob; 
= (k) 
y= cbs). 


(4 = PAM, has just been computed and R® = A.) 


Step 16 (Steps 16-18 compute A“+!,) 
Set z, = Xn3 


(k+1) ‘ 
a, = 0291 + C2213 


k+1 
bs Die 0222. 


(k+1) : 
set qj = Of414j + CjCj41%)3 
(k+l) 
Dis} = Oj+15j+1- 


Step 18 Set a®*) = cnzp. 
Step 19 Setk=k+1. 


Step 20 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. rT] 


A similar procedure can be used to find approximations to the eigenvalues of a non- 
symmetric n x n matrix. The matrix is first reduced to a similar upper-Hessenberg matrix 
H using the Householder Algorithm for nonsymmetric matrices described at the end of 
Section 9.4. 

The QR factoring process assumes the following form. First 


H =H” = QUR, (9.20) 
Then H”) is defined by 
H® = RM” (9.21) 
and factored into 
H® = QOR®, (9.22) 


The method of factoring proceeds with the same aim as the QR Algorithm for Sym- 
metric Matrices. That is, the matrices are chosen to introduce zeros at appropriate entries of 
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James Hardy Wilkinson 
(1919-1986) is best known for 
his extensive work on numerical 
methods for solving systems of 
linear equations and eigenvalue 
problems. He also developed the 
numerical linear algebra 
technique of backward error 
analysis. 
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the matrix, and a shifting procedure is used similar to that in the QR method. However, the 
shifting is somewhat more complicated for nonsymmetric matrices since complex eigen- 
values with the same modulus can occur. The shifting process modifies the calculations in 
Egs. (9.20), (9.21), and (9.22) to obtain the double QR method 


HY ol = OYVR, H? = RYQY +o4l, 


H® — ol = OR, Ho = R®Q® + ool, 
where o; and o2 are complex conjugates and H),H®,... 
matrices. 

A complete description of the QR method can be found in works of Wilkinson [Wil2]. 
Detailed algorithms and programs for this method and most other commonly employed 
methods are given in [WR]. We refer the reader to these works if the method we have 
discussed does not give satisfactory results. 

The QR method can be performed in a manner that will produce the eigenvectors of a 
matrix as well as its eigenvalues, but Algorithm 9.6 has not been designed to accomplish 
this. If the eigenvectors of a symmetric matrix are needed as well as the eigenvalues, we 
suggest either using the Inverse Power method after Algorithms 9.5 and 9.6 have been 
employed or using one of the more powerful techniques listed in [WR]. 


are real upper-Hessenberg 


EXERCISE SET 9.5 


1. 


Apply two iterations of the QR method without shifting to the following matrices. 


2 -l 0 3 1 0 
a. -1 2 = b. 1 4 2 
0 -!il 2 0 2 1 
4 -1 0 1 1 0 
c. -1 3° = 1 2 -l 0 
0 -!il 2 o Oo -l 3 1 
0 0 1 4 
—2 1 0 0 0.5 0.25 0 0 
r 1 -3 -1 0 f 0.25 08 04 O 
° 0 -l 1 1 ° 0 04 06 O.1 
0 0 1 3 0 0 0.1 1 


Apply two iterations of the QR method without shifting to the following matrices. 


2 -l 0 
a. -1 -l -2 

0 -2 3 
Cc. 


3 1 0 
b. 1 4 2 
0 2 3 


4200 0 5 -l 0 oO 0 
eres E 45 02 0 5 
02420 d | 9 02 1 -04 0 
002 4 2 0 0 -04 3 1 
0002 4 0 0 0 1 3 


Use the QR Algorithm to determine, to within 10~>, all the eigenvalues for the matrices given in 
Exercise 1. 
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Use the QR Algorithm to determine, to within 10~, all the eigenvalues of the following matrices. 


2-1 0 3 1 0 
a. -1 -1l -2 b. 14 2 
0-2 3 0 2 3 
42000 5 -l 0 0 0 
one -1 45 02 0 0 
c. 024 2 0 | d. 0 02 1 -04 0 
002 4 2 0 0 -04 3 1 
re | ne 0 1 3 | 


Use the Inverse Power method to determine, to within 10~>, the eigenvectors of the matrices in 
Exercise 1. 
Use the Inverse Power method to determine, to within 10~>, the eigenvectors of the matrices in 
Exercise 2. 


a. Show that the rotation matrix 
cos@ —sin@ 
sin 8 cos 8 
applied to the vector x = (x, x)’ has the geometric effect of rotating x through the angle 0 
without changing its magnitude with respect to the /, norm. 


b. Show that the magnitude of x with respect to the /,. norm can be changed by a rotation matrix. 


Let P be the rotation matrix with pj; = pj = cos@ and pj; = —pj = sin 9, for j < i. Show that for 
any n X n matrix A: 


Ang, if g i,j, 
(AP) pq = } (COSO)apj + (SinO)ay;, ifg =f, 
(cos @)api — (sin@)a,j, if g =i. 


Ang, if p i,j, 
(PA) pg = 4 (COSA)ajq — (SiN A)ajg, if p =j, 
(sin @)djq + (cosO)dig, if p = i. 


Show that the product of an upper triangular matrix (on the left) and an upper Hessenberg matrix 
produces an upper Hessenberg matrix. 


Let P; denote a rotation matrix of the form given in (9.17). 


a. Show that P5P% differs from an upper triangular matrix only in at most the (2,1) and (3,2) 
positions. 

b. Assume that P}P;---P; differs from an upper triangular matrix only in at most the (2, 1), 
(3,2),...,(k,k—1) positions. Show that P5P3--- P_.P;.,, differs from an upper triangular matrix 
only in at most the (2, 1), (3,2),...,(k,k — 1), (k + 1,4) positions. 

c. Show that the matrix P§P%---P’, is upper Hessenberg. 


Jacobi’s method for a symmetric matrix A is described by 


A, =A, 
Ad => P\A\P{ 
and, in general, 
Ai+1 = P;A;Pi. 


The matrix Aj;; tends to a diagonal matrix, where P; is a rotation matrix chosen to eliminate a large 
off-diagonal element in A;. Suppose aj, and a, ; are to be set to 0, where j 4 k. If aj A aye, then 


1 b Cc 
P; — P; = —{1 ——SS— 5 P; = — P; ‘ks 
(Pig = (Pie rc ( + aa =) (Pi) xj 2(P)) Jere (Pi)jx 
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where 
c = 2agsgn(ajy — ag) and b= |ay — al, 


or if aj = Akks 


(Pi) = Pde = Be 


and 


we 
(Pi) = —Pidix = a 


Develop an algorithm to implement Jacobi’s method by setting a2; = 0. Then set a3), a32, dai, a2, 
d43,..+54n1,+++»Ayn—1 in turn to zero.This is repeated until a matrix A; is computed with 


n n 


Dla?! 


i=1 j=l 
J#i 
sufficiently small. The eigenvalues of A can then be approximated by the diagonal entries of Ax. 
12. Repeat Exercise 3 using the Jacobi method. 


13. Inthe lead example of this chapter, the linear system Aw = —0.04(o/p)Aw must be solved for w and 
A in order to approximate the eigenvalues A; of the Strum-Liouville system. 


a. Find all four eigenvalues j11,..., (44 of the matrix 


to within 107°. 
b. Approximate the eigenvalues A,,..., 44 of the system in terms of p and p. 
14. The (m — 1) x (m — 1) tridiagonal matrix 


1—2a a O werreeeeees 0 

a 1 — 2a a ° 

Aes ue | 0 
ea Mees Asis 


is involved in the Forward Difference method to solve the heat equation (see Section 12.2). For the 
stability of the method we need p(A) < 1. With m = 11, approximate the eigenvalues of A for each 


of the following. 


1 b 1 
a. Qg= — ie a=-r- 

4 2 
When is the method stable? 


15. The eigenvalues of the matrix A in Exercise 14 are 
-\ 2 
Wi 
Ai = 1 ~4a (sin) , fori=1,...,m—1. 
2m 


Compare the approximations in Exercise 14 to the actual eigenvalues. Again, when is the method 
stable? 
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| 9.6 Singular Value Decomposition 


Definition 9.24 


Theorem 9.25 


Theorem 9.26 


In this section we consider the factorization of a general m x n matrix A into what is called 
a Singular Value Decomposition. This factorization takes the form 


A=USV', 


where U is an m x m orthogonal matrix, V is an n x n orthogonal matrix, and S is an 
m X n matrix whose only nonzero elements lie along the main diagonal. We will assume 
throughout this section that m > n, and in many important applications m is much larger 
than n. 

Singular Value Decomposition has quite a long history, being first considered by math- 
ematicians in the latter part of the 19th century. However, the important applications of the 
technique had to wait until computing power became available in the second half of the 
20th century, when algorithms could be developed for its efficient implementation. These 
were primarily the work of Gene Golub (1932-2007) in a series of papers in the 1960s and 
1970s (see, in particular, [GK] and [GR]). A quite complete history of the technique can 
be found in a paper by G. W. Stewart, which is available through the internet at the address 
given in [Stew3]. 

To factor A we consider the n x n matrix A‘A and the m x m matrix AA‘. The following 
definition is used to describe some essential properties of arbitrary matrices. 


Let A be an m X n matrix. 


(i) The Rank of A, denoted Rank(A) is the number of linearly independent rows in A. 


(ii) The Nullity of A, denoted Nullity(A), is 1 — Rank(A), and describes the largest set 
of linearly independent vectors v in R” for which Av = 0. a 


The Rank and Nullity of a matrix are important in characterizing the behavior of the 
matrix. When the matrix is square, for example, the matrix is invertible if and only if its 
Nullity is 0 and its Rank is the same as the size of the matrix. 

The following is one of the basic theorems in linear algebra. 


The number of linearly independent rows of an m x n matrix A is the same as the number 
of linearly independent columns of A. a 


The next result gives some useful facts about the matrices AA‘ and A‘A. 


Let A be m x n matrix. 
(i) The matrices A‘A and AA’ are symmetric. 
(ii) ~Nullity(A) = Nullity(A‘A). 
(iii) Rank(A) = Rank(A‘A). 


(iv) The eigenvalues of A‘A and AA’ are real and nonnegative. 


(v) The nonzero eigenvalues of AA‘ and A’A are the same. a 
Proof (i) Because (A‘A)' =A! (A' y = A'A, this matrix is symmetric, and similarly, 
so is AA’. 
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(ii) Let v 4 0 be a vector with Av = 0. Then 
(A‘A)v = A‘(Av) = A‘0=0, so Nullity(A) < Nullity(A‘A). 
Now suppose that v is a vector with A'Av = 0. Then 
0 = v'A'Ay = (Av)'Av = ||Av||5, which implies that Av = 0. 


Hence Nullity(A’A) < Nullity(A). As a consequence, Nullity(A‘A) = Nullity(A). 


(iii) The matrices A and A‘A both have n columns and their Nullities agree, so 


Rank(A) = n — Nullity(A) = n— Nullity(A‘A) = Rank(A‘A). 


(iv) The matrix A’‘A is symmetric so by Corollary 9.17 its eigenvalues are real numbers. 
Suppose that v is an eigenvector of A‘A with ||v||2 = 1 corresponding to the 
eigenvalue 2. Then 


0 < ||Av||3 = (Av)' (Av) = v/A‘Av = v! (4‘Av) = v'(Av) = Av'v = Allvll3 =A. 


(v) Let v be an eigenvector corresponding to the nonzero eigenvalue 4 of A‘A. Then 
A‘Av=Av_ implies that (AA")Av = AAV. 


If Av = 0, then A‘Av = A‘0 = 0, which contradicts the assumption that A 4 0. 
Hence Av # 0 and Av is an eigenvector of AA’ associated with A. The reverse 
conclusion also follows from this argument because if A is a nonzero eigenvalue 
of AA’ = ( a A’, then A is also an eigenvalue of A‘ (A‘)' =A‘A. so 8 


In Section 5 of Chapter 6 we saw how effective factorization can be when solving linear 
systems of the form Ax = b when the matrix A is used repeatedly for varying b. We now 
consider a technique for factoring a general m x n matrix. It has application in many areas, 
including least squares fitting of data, image compression, signal processing, and statistics. 


Constructing a Singular Value Decomposition 


A non-square matrix A, that is, a matrix with a different number of rows and columns, 
cannot have an eigenvalue because Ax and x will be vectors of different sizes. However, 
there are numbers that play roles for non-square matrices that are similar to those played 
by eigenvalues for square matrices. One of the important features of the Singular Value 
Decomposition of a general matrix is that it permits a generalization of eigenvalues and 
eigenvectors in this situation. 


Our objective is to determine a factorization of the m x n matrix A, where m > n, in 
the form 


A=USV', 
where U is an m x m orthogonal matrix, V is n x n an orthogonal matrix, and S is anm x n 


diagonal matrix, that is, its only nonzero entries are (S);; = s; > 0, fori = 1,...,n. (See 
Figure 9.2.) 
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Figure 9.2 


Definition 9.27 
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S 
n columns 


n columns m columns n columns 


Constructing S in the factorization A = US V'. 


We construct the matrix S by finding the eigenvalues of the n x n symmetric matrix A‘A. 
These eigenvalues are all non-negative real numbers, and we order them from largest to 
smallest and denote them by 


Sp ESS SS > Se = = 5, =O. 
That is, we denote by iH the smallest nonzero eigenvalue of A‘A. The positive square roots of 
these eigenvalues of A’A give the diagonal entries in S. They are called the singular values 
of A. Hence, 


S| 0 0 

0 S2 

: 0 
S=| 9 0 s, |> 

0 0 

0 ae. 4 0 


where s; = 0 whenk <i<n. 


The singular values of an m x n matrix A are the positive square roots of the nonzero 
eigenvalues of the n x n symmetric matrix A‘A. a 


Determine the singular values of the 5 x 3 matrix 


> 

ll 
=O oS = 
SSeS Re 
oS. OSS: - 


Solution We have 


me ee be 
eS he 
Nee 


0 1 
1 1 so A'A= 
0 0 
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The characteristic polynomial of A‘A is 


p(AtA) = 3 — 802+ 117A —- 10 = (A—- A —2)(A—D), 


so the eigenvalues of A‘A are A; = st = 5,A. = 3 = 2, and A3 = se = 1. Asa 
consequence, the singular values of A are s; = V5, s) = V2, 53 = 1, and in the singular 
value decomposition of A we have 


oorc © 
| 


When A is a symmetric n x n matrix, all the se are eigenvalues of A? = A’A, and these 
are the squares of the eigenvalues of A. (See Exercise 15 of Section 7.2.) So in this case the 
singular values are the absolute values of the eigenvalues of A. In the special case when A 
is positive definite, or even nonnegative definite, the eigenvalues and singular values of A 
are the same. 


Constructing V in the factorization A = US V'. 


The n x n matrix A’A is symmetric, so by Theorem 9.16 in Section 9.2 (see page 572), we 
have a factorization 


A'A=VDV', 


where D is a diagonal matrix whose diagonal entries are the eigenvalues of A‘'A, and V 
is an orthogonal matrix whose ith column is an eigenvector with /, norm | corresponding 
to the eigenvalue on the ith diagonal entry of D. The specific diagonal matrix depends on 
the order of the eigenvalues along the diagonal. We choose D so that these are written in 
decreasing order. The columns, denoted Vio V>> ..., Vi, of the n x n orthogonal matrix V 
are orthonormal eigenvectors corresponding to these eigenvalues. Multiple eigenvalues of 
A‘A permit multiple choices of the corresponding eigenvectors, so although D is uniquely 
defined, the matrix V might not be. No problem, though, we can choose any such V. Because 


the eigenvalues of A‘A are all nonnegative we have D = S?. 


Constructing U in the factorization A = US V‘. 


To construct the m xm matrix U, we first consider the nonzero values s; > s2 >--- > s, > O 
and the corresponding columns in V given by vj, V2,..., Vz. We define 


u; = Lie fori=1,2,...,k. 
Sj 
We use these as the first k of the m columns of U. Because A is m x n and each vy; isn x 1, 
the vector u; is m x 1, as required. In addition, for each 1 < i < k and 1 <j < k, the 
fact that the vectors v},V2,...,V, are eigenvectors of A‘A that form an orthonormal set 
implies that 


1 "] 1 1 5) 0 ifi<j, 
uu; = ( avi) AV; = ViA‘'AV; = —VisFVj = viv; = ’ a 
Sj Sj SiSj YAY, Sj 1 ifi=j. 


So the first k columns of U form an orthonormal set of vectors in R”. However, we need 
m — k additional columns of U. For this we first need to find m — k vectors which when 
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added to the vectors from the first k columns will give us a linearly independent set. Then 
we can apply the Gram-Schmidt process to obtain appropriate additional columns. 

The matrix U will not be unique unless k = m, and then only if all the eigenvalues of 
A‘A are unique. Non-uniqueness is of no concern;, we only need one such matrix U. 


Verifying the factorization A = USV‘. 


To verify that this process actually gives the factorization A = USV’, first recall that the 
transpose of an orthogonal matrix is also the inverse of the matrix. (See part (i) of Theorem 
9.10 on page 570.) Hence to show that A = USV' we can show the equivalent statement 
AV = US. 

The vectors V], V2,..., V, forma basis for R”, Av; = s;u;, fori = 1,...,k,andAv; = 0, 
fori =k-+1,...,n. Only the first k columns of U produce nonzero entries in the product 
US, so we have 


AV =A[vi Vo +++ VE Viti +++ V; 
= [Av Avo --- Avi AVig1 +++ AVn| 


=> [s) uy S2U2 +--+ S,Ug 0 -+-0) 


0 
= [u, Ub --- m0 -+-0) 0 0 % 0 0 = US, 
0 0 0 0 


This completes the construction of the Singular Value Decomposition of A. 


Determine the singular value decomposition of the 5 x 3 matrix 


> 

II 
re OOCoOrF 
See Re © 


1 
0 
1 
0 
0 


Solution We found in Example | that A has the singular values s; = /5, 3 = /2, and 
53 = 1, so 


Js. 6 
fd 


0 
0 
0 


oO °o:o © 
ooro°o 


Eigenvectors of A‘A corresponding to 5; = J/5, = J/2 and 53 = l, are, respectively, 
(1, 2, 1)‘, (1, -1, 1)‘, and (—1, 0, 1)‘ (see Exercise 5). Normalizing these vectors and using 
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the values for the columns of V gives 


Ye VB _vy2 ve ve vo 

6 3 2 6 3 6 

_ v6 V3 Bn V3 V3 V3 
Ve) a 0 a ll (a 
ve vB v2 2 g wv 

6 3 2 2 2 


The first 3 columns of U are therefore 


u = 4 -4(2, 4,4) = (2 ¥30 -¥30 30 0) 
1~ 5 6° 3° 6) —\ 15° 15° 107 15° 10 
cil Ji _f BY _ (6 _f Vb o)' 
w= -a(¥ 2,8) =(%, 50; 4,0) 
t t 
uw; =1-A(-4,0,-2) = (0,0, 2,0,-2) 


To determine the two remaining columns of U we first need two vectors x4 and xs so that 
{U,, Uo, U3, X4, Xs} is a linearly independent set. Then we apply the Gram Schmidt process 
to obtain uy and us so that {u;, U2, Us, U4, Us} is an orthogonal set. Two vectors that satisfy 
are 


(1,1,-1,1,-1)' and (0,1,0,—1,0)'. 


Normalizing the vectors u;, for i = 1,2,3,4, and 5 produces the matrix U and the singular 
value decomposition as 


Y0 V6 g 9g 
1 3 5 
VO _v6 yg v5 y2 V5 0 0 
15 6 vi) 
0 /2 0 
ALvevs | G9 2 =! @ 0 0 1 
9g wv 2 {/ 0 0 0 
15 3 5 2, 0 oO O 
30 v2 _ V5 
@ 0 -8 -8 0 
ve ve v6 
6 3 6 
3 3 V3 
«| € -€ 8 . 
V3 V3 
=a J oe 


A difficulty with the process in Example 2 is the need to determine the additional 
vectors x4 and x5 to give a linearly independent set on which we can apply the Gram 
Schmidt process. We will now consider a way to simplify the process in many instances. 


An alternative method for finding U 


Part (v) of Theorem 9.26 states that the nonzero eigenvalues of A‘A and those of AA’ are the 
same. In addition, the corresponding eigenvectors of the symmetric matrices A‘A and AA’ 
form complete orthonormal subsets of R” and R”, respectively. So the orthonormal set of 
n eigenvectors for A‘A form the columns of V, as outlined above, and the orthonormal set 
of m eigenvectors for AA‘ form the columns of U in the same way. 

In summary, then, to determine the Singular Value Decomposition of the m x n matrix 


A we can: 

e Find the eigenvalues x > se SS sf > Sepp = ++: =S, =O for the symmetric 
matrix A‘A, and place the positive square root of s? in the entry (S);; of the n x n diagonal 
matrix S. 
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@ Find a set of orthonormal eigenvectors {v,, V2,...,V,} corresponding to the eigenvalues 
of A‘A and construct the n x n matrix V with these vectors as columns. 


e Find a set of orthonormal eigenvectors {u, U2,...,U,,} corresponding to the eigenvalues 
of AA’ and construct the m x m matrix U with these vectors as columns. 


Then A has the Singular Value Decomposition A = US V’. 


Example 3 Determine the singular value decomposition of the 5 x 3 matrix 


1 0 1 
0 1 0 
A=] 0 1 1 
0 1 0 
1 1 0 


by determining U from the eigenvectors of AA’. 


Solution We have 


= 
ll 
RPooor 
mere RO 
coOoror 
eH Or 
oro 
a) 
oro 
ll 
RK OrFoNn 
a) 
SPrePNR eR 
a) 
Ne Ree 


which has the same nonzero eigenvalues as A‘A, that is, A; = 5, A. = 2, and A3 = 1, and, 
additionally, 44 = 0, and As = O. Eigenvectors corresponding to these eigenvalues are, 
respectively, 


¥j= (2, 2,3, 2,3); X) = (2, —1,0,—1, 0)’, x3 = (0,0, 1,0,—-1)', x4=(1,2,-1,0,-1L)’, 


and x5 = (0,1,0,—1,0)'. 

Both the sets {x), Xo, X3, x4} and {x), X2, X3, Xs} are orthogonal because they are eigen- 
vectors associated with distinct eigenvalues of the symmetric matrix AA’. However, x, is not 
orthogonal to x5. We will keep x, as one of the eigenvectors used to form U and determine 
the fifth vector that will give an orthogonal set. For this we use the Gram Schmidt process 
as described in Theorem 9.8 on page 567. Using the notation in that theorem we have 


Vi = X1, V2 = X2, V3 = X3, V4 = &, 
and, because xs is orthogonal to all but x4, 


ViX5 
V5 = X5 — — — X4 
V4V4 


(1,2, -1,0, —1) - (0, 1,0, —1, 0)’ 


= (0, 1,0,—1,0)! 
: (|, 2, -1, 0, —1)*|[2 


(1,2,—1,0, —1) 


2 1 
= (0, 1,0, —1, 0)’ 5,2, 1,0,-1)' = 72, 3, -2,7, -2)'. 


It is easily verified that v5 is orthogonal to v4 = x4. It is also orthogonal to the vectors in 
{V1, V2, V3} because it is a linear combination of x4 and x5. Normalizing these vectors gives 
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the columns of the matrix U in the factorization. Hence 


3M vv g wv vO 
15 3 7 35 
WH _ yg wi _wD 
15 6 7 70 
U =[uy,U2,u3,u4,us]= | 2 9 2 vt _w7 
YD _% vm 
— oe 0 to 
30 0 V2 V7 /70 
10 2 7 35 


This is a different U from the one found in Example 2, but it gives a valid factorization 
A= US V'‘' using the same S and V as in that example. a 


Maple has a SingularValues command in its LinearAlgebra package. It can be used to 
output the singular values of a matrix A as well as the orthogonal matrices U and V. For 
example, for the matrix A in Examples 2 and 3 the command 


U,S, Vt := SingularValues (A, output = [/U’, 'S’, ‘Vt']) 


produces orthogonal matrices U and V and acolumn vector S$ containing the singular values 
of A. By default, Maple uses 18 digits of precision for the calculations. 


Least Squares Approximation 


The singular value decomposition has application in many areas, one of which is an alter- 
native means for finding the least squares polynomials for fitting data. Let A be anm x n 
matrix, with m > n, and b is a vector in R”. The least squares objective is to find a vector 
x in R” that will minimize ||Ax — b]|o. 

Suppose that the singular value decomposition of A is known, that is 


A=USV', 


where U is an m x m orthogonal matrix, V is an n x n orthogonal matrix, and S$ is an 
m X n matrix that contains the nonzero singular values in decreasing order along the main 
diagonal in the first k < n rows, and zero entries elsewhere. Because U and V are both 
orthogonal we have U~! = U', V-! = V’, and by part (iii) of Theorem 9.10 in Section 9.2 
on page 570, U and V are both /)-norm preserving. As a consequence, 


||Ax — bl|2 = ||US V'x — UU'D||2 = [|S Vx — U'd| Io. 


Define z = V’x and c = Ub. Then 


||Ax — blo =I|(s1z1 — C1, 5222 — C2,-- +, SkZk — Chs —Ckt1, +++» —Cm)'|I2 
k m 1/2 
2 2 
= Yo (sizi - ci)” + > (ci) 
=A i=k+1 
The norm is minimized when the vector z is chosen with 
G 
_ when i < k, 
Gis 8 


arbitrarily, whenk <i<n. 


Because ¢c = U‘b and x = Vz are both easy to compute, the least squares solution is also 
easily found. 
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of degree two for the data given in Table 9.5. 


Use the singular value decomposition technique to determine the least squares polynomial 


Solution This problem was solved using normal equations as Example 2 in Section 8.1. 
Here we first need to determine the appropriate form for A, x, and b. In Example 2 in Section 
8.1 the problem was described as finding ao, a;, and az with 


P(x) = dap tayx+ ax’. 
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Table 9.5 

U Xj Ji 

1 0 1.0000 

2 0.25 1.2840 

) 0.50 1.6487 

4 0.75 2.1170 

5 1.00 2.7183 


In order to express this in matrix form, we let 


yo 1.0000 
ao yy 1.2840 
x=] a |, b=] yw |= 1.6487 |, and 
a2 ¥3 2.1170 
V4 2.7183 
1 xo ee 1 0 0 
1 x 4G 1 0.25 0.0625 
A=] 1 x» Be =] 1 O5 0.25 
1 x a 1 0.75 0.5625 
1 x4 Be 1 1 1 


The singular value decomposition of A has the form A = U S V', where 


—0.2945 —0.6327 0.6314 —0.0143 —0.3378 
—0.3466 —0.4550 —0.2104 0.2555 = 0.7505 
U=| —0.4159 —0.1942 —0.5244 —0.6809 —0.2250 |, 
—0.5025 0.1497 —0.3107 0.6524 —0.4505 
—0.6063 0.5767 =0.4308 —0.2127 0.2628 
2.7117 0 0 
0 0.9371 0 —0.7987 —0.4712 —0.3742 
S= 0 0 0.1627 |}, and V’=] —0.5929 0.5102 0.6231 
0 0 0 0.1027 —0.7195 0.6869 
0 0 0 
So 
Yo —0.2945 —0.6327 0.6314 —0.0143 —0.3378 ]’ 1 
YI —0.3466 —0.4550 —0.2104 0.2555 0.7505 1.284 
c=U'| yo | =| —0.4159 —0.1942 —0.5244 —0.6809 —0.2250 1.6487 
y3 —0.5025 0.1497 —0.3107 0.6524 —0.4505 2.117 
y4 —0.6063 0.5767 0.4308 —0.2127 0.2628 2.7183 
—4.1372 
0.3473 
= 0.0099 |, 
—0.0059 
0.0155 
and the components of z are 
c.  —4.1372 co 0.3473 
= = —1.526, = — = —— = 0.3706, d 
ig AN on 0.9371 . 
c3 0.0099 
= — = —— = 0.0609. 
353 0.1627 
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This gives the least squares coefficients in P2(x) as 


ao —0.7987 —0.5929 = 0.1027 —1.526 1.005 
aq) }=x=Vz=| —04712 0.5102 —0.7195 0.3706 |=] 0.8642 |, 
a2 —0.3742 0.6231 0.6869 0.0609 0.8437 


which agrees with the results in Example 2 of Section 8.1. The least squares error using 
these values uses the last two components of c, and is 


||Ax — bll2 = ./c + c2 = Vv (—0.0059)? + (0.0155)? = 0.0165. BI 


Additional Applications 


The reason for the importance of the singular value decomposition in many applications is 
that it permits us to obtain the most important features of an m x n matrix using a matrix that 
is often of significantly smaller size. Because the singular values are listed on the diagonal 
of S in decreasing order, retaining only the first k rows and columns of S produces the best 
possible approximation of this size to the matrix A. As an illustration, recall the figure, 
reproduced for reference as Figure 9.3, that indicates the singular value decomposition of 
the m x n matrix A. 


Figure 9.3 


U 


n TOWS 


A Ss yt 
a a - n columns 


n columns m columns n columns 


Replace the n x n matrix S with the k x k matrix S; that contains the most significant 
singular values. These would certainly be only those that are nonzero, but we might also 
delete some singular values that are relatively small. 

Determine corresponding k xn andmxk matrices U; and Vis respectively, in accordance 
with the singular value decomposition procedure. This is shown shaded in Figure 9.4. 


Figure 9.4 


n columns 


n columns k columns k columns 
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Then the new matrix A; = U; S; Vi is still of size m x n and would require m-n storage 
registers for its representation. However, in factored form, the storage requirement for the 
data is m-k, for U,, k for S;, and n - k for Vis for a total of k(m+n-+ 1). 

Suppose, for example, that m = 2n, and k = n/3. Then the original matrix A contains 
mn = 2n? items of data. The factorization producing A; however, contains only mk = 2n?/3, 
for U;, k for Sx, and nk = n?/3 for V/, items of data which occupy a total of (n/3)(3n? + 1) 
storage registers. This is a reduction of approximately 50% from the amount required to 
store the entire matrix A, and results in what is called data compression. 


Illustration In Example 2 we demonstrated that 
so 2 = 8 
YW 4 9 4 v5 0 0 & & ¥ 
15 6 5 2 0 J/2 0 6 3 Pa 
30 2 5 7 
Asusvis| 2 0 #2 -% 0 0 o 1|x| 2 -¥ ¥ 
V30 V6 V5 V2 0 0 O 
ae oe St ae =p aw -2 0 # 
V30 V2 J5 
— 2 =a = 2 
Consider the reduced matrices associated with this factorization 
30s 6 
Ea os 8 
V30 _ v6 V6 v6 v6 
: s v5 0 0 6 3 6 
U3= a 0 2 »5=!]0 2 Of, and V;= 3 -%3 a 
V3 _v¥6 9 0 oO 1 2 J2 
15 3 —Y 0 va 
30 0 _ v2 
10 2 
Then 
YO YD V30 ioe 
6 3 6 01 0 
SVj=| 2 -% and A;=U3S3V;=| 0 1 1 
_B 0 v2 0 1 0 
2 2 1 1 0 


Because the calculations in the [lustration were done using exact arithmetic, the matrix 
A3 agreed precisely with the original matrix A. In general, finite-digit arithmetic would be 
used to perform the calculations, and absolute agreement would not be expected. The hope 
is that the data compression does not result in a matrix A, that significantly differs from 
the original matrix A, and this depends on the relative magnitudes of the singular values of 
A. When the rank of the matrix A is k there will be no deterioration since there are only k 
rows of the original matrix A that are linearly independent and the matrix could, in theory, 
be reduced to a matrix which has all zeros in its last m — k rows or n — k columns. When k 
is less than the rank of A, then A, will differ from A, but this is not always to its detriment. 

Consider the situation when A is a matrix consisting of pixels in a gray-scale photograph, 
perhaps taken from a great distance, such as a satellite photo of a portion of the earth. The 
photograph likely includes noise, that is, data that doesn’t truly represent the image, but 
rather represents the deterioration of the image by atmospheric particles, quality of the 
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lens and reproduction process, etc. The noise data is incorporated in the data given in A, 
but hopefully this noise is much less significant than the true image. We expect the larger 
singular values to represent the true image and the smaller singular values, those closest to 
zero, to be contributions of the noise. By performing a singular value decomposition that 
retains only those singular values above a certain threshold we might be able to eliminate 
much of the noise, and actually obtain an image that is not only smaller is size but a truer 
representation than the original photograph. (See [AP] for further details; in particular, 
Figure 3.) 

Additional important applications of the singular value decomposition include deter- 
mining effective condition numbers for square matrices (see Exercise 15), determining the 
effective rank of a matrix, and removing signal noise. For more information on this impor- 
tant topic and a geometric interpretation of the factorization see the survey paper by Kalman 
[Ka] and the references in that paper. For a more complete and extensive study of the theory 
see Golub and Van Loan [GV]. 


EXERCISE SET 9.6 


1. Determine the singular values of the following matrices. 


[i | 
—1 1 1 
eT a 01 -1 
2 -1 [ 1 1 
2. Determine the singular values of the following matrices. 


* a=| 1 a b A= 


— 
—| 
Q 
> 
Il 
1 
| 
— 
SF Oe 
e 


1f 


Fe Orocoedcorre 
Bee Re HOH 


Se Oc OF FF OC 


1 -l 
1 1 
c A= 0 1 d. A= 
1 0 
1 1 


0 


3. Determine a singular value decomposition for the matrices in Exercise 1. 
4. Determine a singular value decomposition for the matrices in Exercise 2. 
5. LetA be the matrix given in Example 2. Show that (1, 2, 1)’, (1, —1, 1)’, and (—1, 0, 1)‘ are eigenvectors 
of A‘A corresponding to, respectively, the eigenvalues A; = 5, A, = 2 and A3 = 1. 
6. Suppose that A is an m x n matrix A. Show that Rank(A) is the same as the Rank(A‘). 
7. Show that Nullity(A) = Nullity(A‘) if and only if A is a square matrix. 
8. Suppose that A has the singular value decomposition A = US V’'. Determine, with justification a 
singular value decomposition of A’. 
9. Suppose that A has the singular value decomposition A = U S V’. Show that Rank(A) = Rank(S). 
10. Suppose that the m x n matrix A has the singular value decomposition A = U SV‘. Express the 
Nullity(A) in terms of Rank(S). 
11. Suppose that the n x n matrix A has the singular value decomposition A = US V'. Show that A7! 
exists if and only if S~' exists and find a singular value decomposition for A~' when it exists. 


12. ‘Part (ii) of Theorem 9.26 states that Nullity(A) = Nullity(A‘A). Is it also true that Nullity(A) = 
Nullity(AA‘)? 
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13. Part (iii) of Theorem 9.26 states that Rank(A) = Rank(A‘A). Is it also true that Rank(A) = Rank(AA’)? 

14. Show that if A is an m x n matrix and P is ann x n orthogonal matrix, then PA has the same singular 
values as A. 

15. Show that if A is ann x n nonsingular matrix with singular values 51, 52, ..., 5,, then the /, condition 
number of A is K2(A) = (81 /S,). 

16. Use the result in Exercise 15 to determine the condition numbers of the nonsingular square matrices 
in Exercises | and 2. 


17. Given the data 


x 10 20 3.0 40 5.0 
y 13 35 42 50 7.0 ’ 


a. Use the singular value decomposition technique to determine the least squares polynomial of 
degree 1. 


b. Use the singular value decomposition technique to determine the least squares polynomial of 
degree 2. 


18. Given the data 


x; 1.0 1.1 1.3 1.5 1.9 2.1 
y, 184 196 2.21 245 2.94 3.18’ 


a. Use the singular value decomposition technique to determine the least squares polynomial of 
degree 2. 

b. Use the singular value decomposition technique to determine the least squares polynomial of 
degree 3. 


| Sa 9.7. Survey of Methods and Software 


The general theme of this chapter is the approximation of eigenvalues and eigenvectors. It 
concluded with a technique for factoring an arbitrary matrix that requires these approxima- 
tion methods. 

The GerSgorin circles give a crude approximation to the location of the eigenvalues of 
a matrix. The Power method can be used to find the dominant eigenvalue and an associated 
eigenvector for an arbitrary matrix A. If A is symmetric, the Symmetric Power method 
gives faster convergence to the dominant eigenvalue and an associated eigenvector. The 
Inverse Power method will find the eigenvalue closest to a given value and an associated 
eigenvector. This method is often used to refine an approximate eigenvalue and to compute 
an eigenvector once an eigenvalue has been found by some other technique. 

Deflation methods, such as Wielandt deflation, obtain other eigenvalues once the dom- 
inant eigenvalue is known. These methods are used if only a few eigenvalues are required 
since they are susceptible to round-off error. The Inverse Power method should be used to 
improve the accuracy of approximate eigenvalues obtained from a deflation technique. 

Methods based on similarity transformations, such as Householder’s method, are used 
to convert a symmetric matrix into a similar matrix that is tridiagonal (or upper Hessenberg 
if the matrix is not symmetric). Techniques such as the QR method can then be applied to 
the tridiagonal (or upper-Hessenberg) matrix to obtain approximations to all the eigenval- 
ues. The associated eigenvectors can be found by using an iterative method, such as the 
Inverse Power method, or by modifying the QR method to include the approximation of 
eigenvectors. We restricted our study to symmetric matrices and presented the QR method 
only to compute eigenvalues for the symmetric case. 
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The Singular Value Decomposition is discussed in Section 9.6. It is used to factor an 
m X n matrix into the form US V’, where U is anm x m orthogonal matrix, V is ann x n 
orthogonal matrix, and S is an m x n matrix whose only nonzero entries are located along the 
main diagonal. This factorization has important applications that include image processing, 
data compression, and solving over-determined linear systems that arise in least squares 
approximations. The singular value decomposition requires the computation of eigenvalues 
and eigenvectors so it is appropriate to have this technique conclude the chapter. 

The subroutines in the IMSL and NAG libraries, as well as the routines in Netlib 
and the commands in MATLAB, Maple, and Mathematica are based on those contained 
in EISPACK and LAPACK, packages that were discussed in Section 1.4. In general, the 
subroutines transform a matrix into the appropriate form for the QR method or one of its 
modifications, such as the QL method. The subroutines approximate all the eigenvalues and 
can approximate an associated eigenvector for each eigenvalue. Nonsymmetric matrices 
are generally balanced so that the sums of the magnitudes of the entries in each row and 
in each column are about the same. Householder’s method is then applied to determine a 
similar upper Hessenberg matrix. Eigenvalues can then be computed using the QR or QL 
method. It is also possible to compute the Schur form S DS‘, where S is orthogonal and 
the diagonal of D holds the eigenvalues of A. The corresponding eigenvectors can then be 
determined. For a symmetric matrix a similar tridiagonal matrix is computed. Eigenvalues 
and eigenvectors can then be computed using the QR or QL method. 

There are special routines that find all the eigenvalues within an interval or region or 
that find only the largest or smallest eigenvalue. Subroutines are also available to determine 
the accuracy of the eigenvalue approximation and the sensitivity of the process to round-off 
error. 

One MATLAB procedure that computes a selected number of eigenvalues and eigen- 
vectors is based on the implicitly restarted Arnoldi method by Sorensen [So]. There is 
software package contained in Netlib to solve large sparse eigenvalue problems, that is also 
based on the implicitly restarted Arnoldi method. The implicitly restarted Arnoldi method 
is a Krylov subspace method that finds a sequence of Krylov subspaces that converge to a 
subspace containing the eigenvalues. 

The books by Wilkinson [Wil2] and Wilkinson and Reinsch [WR] are classics in the 
study of eigenvalue problems. Stewart [Stew2] is also a good source of information on 
the general problem, and Parlett [Par] considers the symmetric problem. A study of the 
nonsymmetric problem can be found in Saad [Sal]. 
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CHAPTER 


| 0 Numerical Solutions of Nonlinear 
Systems of Equations 


The amount of pressure required to sink a large heavy object into soft, homogeneous soil 
lying above a hard base soil can be predicted by the amount of pressure required to sink 
smaller objects in the same soil. Specifically, the amount of pressure p to sink a circular 
plate of radius r a distance d in the soft soil, where the hard base soil lies a distance D > d 
below the surface, can be approximated by an equation of the form 


p= ke?" + ker, 


where kj, kz, and k3 are constants depending on d and the consistency of the soil, but not 
on the radius of the plate. 

There are three unknown constants in this equation, so three small plates with differing 
radii are sunk to the same distance. This will determine the minimal size plate required 
to sustain a large load. The loads required for this sinkage are recorded, as shown in the 
accompanying figure. 


This produces the three nonlinear equations 
m = kye2" + kyr, 
my = kye2” + kgro, 


m3 = kye®? + kgrs, 
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in the three unknowns kj, kz, and k3. Numerical approximation methods are usually needed 
for solving systems of equations when the equations are nonlinear. Exercise 12 of Section 
10.2 concerns an application of the type described here. 

Solving a system of nonlinear equations is a problem that is avoided when possible, 
customarily by approximating the nonlinear system by a system of linear equations. When 
this is unsatisfactory, the problem must be tackled directly. The most straightforward ap- 
proach is to adapt the methods from Chapter 2, which approximate the solutions of a single 
nonlinear equation in one variable, to apply when the single-variable problem is replaced 
by a vector problem that incorporates all the variables. 

The principal tool in Chapter 2 was Newton’s method, a technique that is generally 
quadratically convergent. This is the first technique we modify to solve systems of nonlinear 
equations. Newton’s method, as modified for systems of equations, is quite costly to apply, 
and in Section 10.3 we describe how a modified Secant method can be used to obtain 
approximations more easily, although with a loss of the extremely rapid convergence that 
Newton’s method can produce. 

Section 10.4 describes the method of Steepest Descent. It is only linearly convergent, but 
it does not require the accurate starting approximations needed for more rapidly converging 
techniques. It is often used to find a good initial approximation for Newton’s method or one 
of its modifications. 

In Section 10.5, we give an introduction to continuation methods, which use a parameter 
to move from a problem with an easily determined solution to the solution of the original 
nonlinear problem. 

Many of the proofs of the theoretical results in this chapter are omitted because they 
involve methods that are usually studied in advanced calculus. A good general reference 
for this material is Ortega’s book entitled Numerical Analysis-A Second Course [Or2]. A 
more complete reference is [OR]. 


| 10.1 Fixed Points for Functions of Several Variables 


A system of nonlinear equations has the form 
Siri, x2, ae Xn) = 0, 


2X1, X2, eae Xn) = 0, 
(10.1) 


Sn 1, x2, tee Xn) = 0, 


where each function f; can be thought of as mapping a vector x = (x1,X2,...,Xn)‘ of the 
n-dimensional space R” into the real line R. A geometric representation of a nonlinear 
system when n = 2 is given in Figure 10.1. 

This system of 1 nonlinear equations in n unknowns can also be represented by defining 
a function F mapping R” into R” as 


FQ), x2, tee Xn) = (fi 1,2, tee ahs fo1,%2, - o Xn), - : > Sn(X1, 2, see N' 


If vector notation is used to represent the variables x1, x2,...,Xn, then system (10.1) assumes 
the form 


F(x) = 0. (10.2) 
The functions f), fo,..., f, are called the coordinate functions of F. 
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Figure 10.1 


z= foxy, X2) 


z= fi (Xy x2) 


Example 1 Place the 3 x 3 nonlinear system 


1 
3x1 — cos(x2x3) — 5 = 0, 


x? — 81 (x2 + 0.1)? + sinx3 + 1.06 = 0, 


1 — 
e724 20x5 + = ~0 


in the form (10.2). 


Solution Define the three coordinate functions f;, fy, and f3 from R? to R as 


1 
Ai 1, X2,.%3) = 3x, — cos(x2x3) — x, 


2 

fr(1,X2,%3) = x7 — 81 (x2 + 0.1)? + sinx3 + 1.06, 
10z —3 

3X1, X2,.%3) = EM? + 203 + —— 


Then define F from R* — R?* by 


F(x) = FQ, x2, x3) 


= (fi (1, x2,.%3), fo(X1,%2,%3), f3 (1, x2, x3)! 


1 
= (3 — cos(x2x3) — xT — 81(x. +0.1)? 
10x —3\' 
4 sin x3 + 1.06, e772 + 20x3 + ==) . 


Before discussing the solution of a system given in the form (10.1) or (10.2), we 
need some results concerning continuity and differentiability of functions from R” into R”. 
Although this study could be presented directly (see Exercise 12), we use an alternative 
method that permits us to present the more theoretically difficult concepts of limits and 
continuity in terms of functions from R” into R. 
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Definition 10.1 


Definition 10.2 


Continuous definitions for 
functions of n variables follow 
from those for a single variable 
by replacing, where necessary, 
absolute values by norms. 


Definition 10.3 


Numerical Solutions of Nonlinear Systems of Equations 


Let f be a function defined on a set D C R” and mapping into R. The function / is said to 
have the limit L at xo, written 


lim f(x) = ZL, 


xX—>Xx0 
if, given any number ¢ > 0, anumber 6 > 0 exists with 

f(x) -—L| <e, 
whenever x € D and 


0 < ||x — xol| < 6. a 


The existence of a limit is also independent of the particular vector norm being used, 
as discussed in Section 7.1. Any convenient norm can be used to satisfy the condition in 
this definition. The specific value of 5 will depend on the norm chosen, but the existence of 
a 6 is independent of the norm. 

The notion of a limit permits us to define continuity for functions from R” into R. 
Although various norms can be used, continuity is independent of the particular choice. 


Let f be a function from a set D C R” into R. The function f is continuous at xp € D 
provided lim,_,,, f(x) exists and 


lim f(x) = fo). 


Moreover, f is continuous on a set D if f is continuous at every point of D. This concept 
is expressed by writing f € C(D). a 


We can now define the limit and continuity concepts for functions from R” into R” by 
considering the coordinate functions from R” into R. 


Let F be a function from D C R” into R” of the form 


F(x) = (fi(%), 2%), --- fr)’, 
where f; is a mapping from R” into R for each i. We define 


lim F(x) = L= (Li, Ly,...,L,)', 


X—>Xo 


if and only if limy_,, fi(x) = L;, for eachi = 1,2,...,n. | 


The function F is continuous at x)<¢D provided lim,,,, F(x) exists and 
lim,y_,x) F(x) = F(Xo). In addition, F is continuous on the set D if F is continuous at each 
x in D. This concept is expressed by writing F ¢ C(D). 

For functions from R into R, continuity can often be shown by demonstrating that the 
function is differentiable (see Theorem 1.6). Although this theorem generalizes to functions 
of several variables, the derivative (or total derivative) of a function of several variables is 
quite involved and will not be presented here. Instead we state the following theorem, which 
relates the continuity of a function of n variables at a point to the partial derivatives of the 
function at the point. 
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Theorem 10.4 Let f bea function from D C R” into R and xo € D. Suppose that all the partial derivatives 
of f exist and constants 6 > 0 and K > 0 exist so that whenever ||x — xo|| < 6 and x € D, 
we have 


0 
ee <K, foreachj = 1,2,...,n. 


Ox; 


Then f is continuous at Xo. a 


Fixed Points in R” 


In Chapter 2, an iterative process for solving an equation f(x) = 0 was developed by first 
transforming the equation into the fixed-point form x = g(x). A similar procedure will be 
investigated for functions from R” into R”. 


Definition 10.5 A function G from D Cc R” into R” has a fixed point at p € D if G(p) = p. a 


The following theorem extends the Fixed-Point Theorem 2.4 on page 62 to the 
n-dimensional case. This theorem is a special case of the Contraction Mapping Theorem, 
and its proof can be found in [Or2], p. 153. 


Theorem 10.6 Let D={ (x1,%,...,%n)' | ai < x; < b;, for each i = 1,2,...,n} for some collection of 
constants a}, d2,...,d, and bj, bo,...,b,. Suppose G is a continuous function from D C R” 
into R” with the property that G(x) € D whenever x € D. Then G has a fixed point in D. 
Moreover, suppose that all the component functions of G have continuous partial deriva- 
tives and a constant K < | exists with 


dgi(X) 
Ox; 


K 
<—, whenever x € D, 
n 


foreachj = 1,2,...,nandeach component function g;. Then the sequence {x }eco defined 
by an arbitrarily selected x in D and generated by 


x) — G(x®&Y), for each k > 1 


converges to the unique fixed point p € D and 


(kK) 


Ixk® —pl.. s x? — x]... (10.3) 


~1-K 


Example 2 Place the nonlinear system 


1 
3x1 — CoS(x2x3) — 5 = 0, 


x? — 81 (x2 +.0.1)? + sinx3 + 1.06 = 0, 


107-3 


e 72 4 20x3 + 0. 


in a fixed-point form x = G(x) by solving the ith equation for x;, show that there is a unique 
solution on 


D = { (x1,%2,%3)' | -l <x; <1, foreachi= 1,2,3}. 


and iterate starting with x© = (0.1,0.1, —0.1)! until accuracy within 10~> in the /,, norm 
is obtained. 
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Solution Solving the ith equation for x; gives the fixed-point problem 


1 1 
xX, = > Cos(x2x3) + =, 


3 6 
1 
y= sv + sin x3 + 1.06 — 0.1, (10.4) 
i apeeer 10x —3 
a ele ; 
20 60 


Let G : R* > R’ be defined by G(x) = (g1(X), 22(x), g3(x))’, where 


1 
81 (%1,%2,%3) = 3 COS(X2x3) + ra 


lf, ; 
82(X1,X2,%3) = sv + sinx3 + 1.06 — 0.1, 


_ 102 — 3 
e 712 . 
20 60 
Theorems 10.4 and 10.6 will be used to show that G has a unique fixed point in 


83(X1,%2,%3) = 


D = { (x1,42,%3)' | -1 <2; <1, foreachi = 1,2,3}. 


For x = (x1, x2, x3)! in D, 


1 1 
121 (X1,X2,.%3)| < 3! COS(x2x3)| ++ Ps < 0.50, 


1 1 
[eo (x1,%0,x3)| = pvt sins + 1.06 — 0.1) < 5VI+ sin + 1.06 — 0.1 < 0.09, 
and 
7 gies gata MOSS oT WS 
> > = < Uz. . 
i icaieaes Lames) 60 ~ 20° 60 


So we have, for each i = 1, 2, 3, 
-1 < Bi (X1, X2, X3) < 1. 


Thus G(x) € D whenever x € D. 
Finding bounds for the partial derivatives on D gives 


a a 
981)_ 9 |98)_9 ana |283] — 0, 
Ox} Ox2 0X3 
as well as 
a 1 1 a 1 1 
81) << <iysl- | sinxpxs| < —sin] < 0.281, |—°!| < <|x5|- | sinxx3| < ~ sin] < 0.281, 
3" 3 3 axa | 3 3 
3 
Bel al < 0.238, 


= < 
9x1} 9 |x? 4 sinx3+1.06 9V0.218 


1 
082) _ isos 2 < 0.119, 


O31 ig [24 sina +106  18V0.218 


983 
0x2 


0 x . 1 
983] _ Pal me < 59% < 0-14, and 


= bel xix < ee < 0.14. 
ss za ~ 20 


20 
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The partial derivatives of g1, go, and g3 are all bounded on D, so Theorem 10.4 implies 
that these functions are continuous on D. Consequently, G is continuous on D. Moreover, 
for every x € D, 


| dgi(X) 


Xj 


< 0.281, foreachi=1,2,3 and j=1,2,3, 


and the condition in the second part of Theorem 10.6 holds with K = 3(0.281) = 0.843. 

In the same manner it can also be shown that dg;/0x; is continuous on D for each 
i= 1,2,3 andj = 1, 2,3. (This is considered in Exercise 3.) Consequently, G has a unique 
fixed point in D, and the nonlinear system has a solution in D. 

Note that G having a unique fixed point in D does not imply that the solution to the 
original system is unique on this domain, because the solution for x2 in (10.4) involved 
the choice of the principal square root. Exercise 7(d) examines the situation that occurs if 
the negative square root is instead chosen in this step. 

To approximate the fixed point p, we choose x® = (0.1, 0.1, —0.1)'. The sequence of 
vectors generated by 


1 1 
(k) (k-1) .&-D 
Xx; = > COSx x Sy 
1 3 2 3 6 


1 2 
x = LG) +sinx$~” + 1.06 - 0.1, 


() 1 =x-D, GD 10x — 3 
XxX = —-—e . = 
: 20 60 


converges to the unique solution of the system in (10.4). The results in Table 10.1 were 
generated until 


Jx® —xP] < 10-5, M1 
Table 10.1 k x x x |x - xO 
0 0.10000000 0.10000000 —0.10000000 
1 0.49998333 0.00944115 —0.52310127 0.423 
2 0.49999593 0.00002557 —0.52336331 9.4 x 10-3 
3 0.50000000 0.00001234 —0.52359814 2.3 x 10-4 
4 0.50000000 0.00000003 —0.52359847 1.2 x 10> 
5 0.50000000 0.00000002 —0.52359877 3.1 x 10-7 
We could use the error bound (10.3) with K = 0.843 in the previous example. This 
gives 
5 
(5) (0.843) 
x? — < ——— (0.423) < 1.15, 
Ix — pllac < -—paqq (0423) 


which does not indicate the true accuracy of x©. The actual solution is 


t 
pe (0.5,0,-=) ~ (0.5, 0, —0.5235987757)', so [|x — plloo <2 x 1078. 
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Table 10.2 


Numerical Solutions of Nonlinear Systems of Equations 


Accelerating Convergence 


One way to accelerate convergence of the fixed-point iteration is to use the latest estimates 
Fee ais a instead of a aoa vag? to compute a, as in the Gauss-Seidel method 


for linear systems. The component equations for the problem in the example then become 


1 - = 1 
x? = 3 cos (a ay ») + -, 


6 
1 (w)\2 ._(k-1) 
i = 5 («| ) + sin x; + 1.06 — 0.1, 
1 10x —3 
i el i eee) 
20 60 


With x = (0.1,0.1, —0.1)‘, the results of these calculations are listed in Table 10.2. 


c * 2 Po be], 
0 0.10000000 0.10000000 —0.10000000 

1 0.49998333 0.02222979 —0.52304613 0.423 

2 0.49997747 0.000028 15, —0.52359807 2.2 x 10°? 

3 0.50000000 0.00000004 —0.52359877 2.8 x 10> 

4 0.50000000 0.00000000 —0.52359877 3.8 x 10-8 


The iterate x is accurate to within 10~” in the /,) norm; so the convergence was indeed 
accelerated for this problem by using the Gauss-Seidel method. However, this method does 
not always accelerate the convergence. 

Maple provides the function fsolve to solve systems of equations. The fixed-point 
problem of Example 2 can be solved with the following commands: 


gl i= xl =f c0s (ax) + 2: g2:=22= Ley? + sin(x3) + 1.06 — 0.1: 
g3 = 3= — ee _ wr Y 


fsolve({g1, g2, g3}, {x1,x2,x3}, {xl = —1..1, x2 = -1..1, x3 = —1..1}); 


The first three commands define the system, and the last command invokes the procedure 
fsolve. Maple displays the answer as 


{x1 = 0.5000000000, x2 = —2.079196195 107"', x3 = —0.5235987758} 


In general, fsolve(eqns,vars,options) solves the system of equations represented by the 
parameter eqns for the variables represented by the parameter vars under optional parameters 
represented by options. Under options we specify a region in which the routine is required 
to search for a solution. This specification is not mandatory, and Maple determines its own 
search space if the options are omitted. 


EXERCISE SET 10.1 


1. 


2. 


Show that the function F : R* —> R? defined by 
F(x), X2,.43) = (x1 + 2x3, x1 COS x2, x5 +23)" 


is a continuous at each point of R?. 


Give an example of a function F : R? —> R? that is continuous at each point of R*, except at (1, 0). 
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3. Show that the first partial derivatives in Example 2 are continuous on D. 


4. The nonlinear system 
x10) + 1) + 2x) = 18, (4 — 1)? + ( — 6)? = 25 


has two solutions. 
a. Approximate the solutions graphically. 


b. Use the approximations from part (a) as initial approximations for an appropriate function 
iteration, and determine the solutions to within 10~> in the /,, norm. 


5. The nonlinear system 
xf — 10m +45 +8=0, xix5 +21 —-10m+8=0 
can be transformed into the fixed-point problem 


xp+x34+8 
10 


Xx} +x,4+8 
10 , 


X= 81(%1,%2) = > X2 = B1(%1,%2) = 


a. Use Theorem 10.6 to show that G = (g;, g2)' mapping D C R? into R? has a unique fixed point 
in 


D = {(%,%2)' | 0 < x1,%2 < 1.5}. 


b. Apply functional iteration to approximate the solution. 
c. Does the Gauss-Seidel method accelerate convergence? 


6. The nonlinear system 
Sxp — 45 = 0, x — 0.25(sin x, + cosx2) = 0 


. t 
has a solution near (4, t) F 


a. Find a function G and a set D in R? such that G : D + R? and G has a unique fixed point 
in D. 

b. Apply functional iteration to approximate the solution to within 10~> in the /,. norm. 

c. Does the Gauss-Seidel method accelerate convergence? 


7. Use Theorem 10.6 to show that G : D C R? — R? has a unique fixed point in D. Apply functional 
iteration to approximate the solution to within 10~, using the /,, norm. 
t 


cos(%2x3) + 0.5 1 5 1 _ 10z —3 
»- Gx», = | ———_ _., —,/x, + 0.3125 — 0.03, 2 : 
ee ( 3 a5 Vv" + 20° 60 


D = { (x1, 2,43)’ | -L <x; < 1i=1,2,3} 


? 


13-44 11 —x? 22 3 
b. Gon.saum) = ( x3 + 4x3 + x3 a =) 


15 ; 10 25 
D= {(x1,%2,%3)' |O< x, < 1.5,i= 1,2,3} 
ce G(x),2%,x3) = (1 — cos(x1.x2x3), 1 — (1 — x,)!4 - 0.05x3 + 0.15x3,.x7 
+ 0.1x5 — 0.01x) + 1)'; 
D= { (x1, %2,%3)' | —0.1 < x) < 0.1,-0.1 < x. < 0.3,0.5 < x3 < 1.1} 


1 1 1 
dad. Gy, %2,%3) = (; cos(x2x3) + 6 Vet + sin x3; + 1.06 — 0.1, 


1 a), 


x1 XQ 


20 60 
D = {(%1,%2,%3)' | -l <x; < 1Li=1,2,3} 


’ 
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8. Use functional iteration to find solutions to the following nonlinear systems, accurate to within 10~>, 
using the /,. norm. 


a xy +2x3—x, =0 b. 3x7 — x5 = 0, 
Xp 4x3 — ky = 0. 3xyx5 — x7 —1=0. 
c. xi +x. —37=0, d. xt + 2x3 — x» — 2x3 = 0, 
x, -x%3-5=0, xi — 8x3 + 10x3 = 0, 
xX, +22 +2%3-3=0. 4 es 
71Xx2X3 


9. Use the Gauss-Seidel method to approximate the fixed points in Exercise 7 to within 10~>, using the 
I,. norm. 
10. Repeat Exercise 8 using the Gauss-Seidel method. 
11. In Exercise 10 of Section 5.9, we considered the problem of predicting the population of two species 
that compete for the same food supply. In the problem, we made the assumption that the populations 
could be predicted by solving the system of equations 


dx; (t) 
ano (1)(4 — 0.0003.x; (t) — 0.00042 (1)) 


and 


dx (t) 


= X2(t)(2 — 0.0002x) (t) — 0.0001 x2(f)). 


In this exercise, we would like to consider the problem of determining equilibrium populations of 
the two species. The mathematical criteria that must be satisfied in order for the populations to be at 
equilibrium is that, simultaneously, 


dx; (t) dx (t) 
oi and a =0 


This occurs when the first species is extinct and the second species has a population of 20,000 or 
when the second species is extinct and the first species has a population of 13,333. Can an equilibrium 
occur in any other situation? 

12. Show that a function F mapping D C R" into R" is continuous at x9 € D precisely when, given any 
number ¢ > 0, a number 6 > 0 can be found with property that for any vector norm || - ||, 


F(x) — F(xo) Il < €, 


whenever x € D and ||x — xol| < 6. 


13. Let A be ann x n matrix and F be the function from R” to R” defined by F(x) = Ax. Use the result 
in Exercise 12 to show that F is continuous on R". 


| Sa 10.2 Newton's Method 


The problem in Example 2 of Section 10.1 is transformed into a convergent fixed-point 
problem by algebraically solving the three equations for the three variables x, x2, and x3. 
It is, however, unusual to be able to find an explicit representation for all the variables. In 
this section, we consider an algorithmic procedure to perform the transformation in a more 
general situation. 

To construct the algorithm that led to an appropriate fixed-point method in the one- 
dimensional case, we found a function ¢ with the property that 


g(x) =x— O@) f() 
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gives quadratic convergence to the fixed point p of the function g (see Section 2.4). From this 
condition Newton’s method evolved by choosing ¢(x) = 1/f'(x), assuming that f’(x) 4 0. 
A similar approach in the n-dimensional case involves a matrix 


Q(X) ay2(K) +++ Ain (Xx) 
42 (XK) Ay2(K) +++ Aan (X) 

Aw) =|. ar (10.5) 
An (x) An2 (x) st Ann (x) 


where each of the entries aj(x) is a function from R” into R. This requires that A(x) be 
found so that 


G(x) = x — A(x) | F(x) 


gives quadratic convergence to the solution of F(x) = 0, assuming that A(x) is nonsingular 
at the fixed point p of G. 

The following theorem parallels Theorem 2.8 on page 80. Its proof requires being able 
to express G in terms of its Taylor series in n variables about p. 


Let p be a solution of G(x) = x. Suppose a number 6 > 0 exists with 


(i) 0g;/0x; is continuous on Ns = {x | ||x — p|| < 6}, foreach i = 1,2,...,n and 
JH 1, 2a} 


(ii) 07 gi(x)/(Ox;Ax,) is continuous, and |87.9;(x)/(AxjAxx)| < M for some constant 
M, whenever x € Nj, for eachi = 1, 2,...,.n,j = 1,2,...,n,andk = 1,2,...,n; 


(iii) 0g;(p)/dx, = 0, for each i = 1,2,...,n andk = 1,2,...,n 


Then anumber 6 < 5 exists such that the sequence generated by x = : G(x“) converges 
quadratically to p for any choice of x, provided that |x — P| < 5. Moreover, 


2 


® —pll2,, foreach k > 1. A 


(k—-1) 


n2M 
Ix — Plloo < i 


To apply Theorem 10.7, suppose that A(x) is an n x n matrix of functions from R” 
into R in the form of Eq. (10.5), where the specific entries will be chosen later. Assume, 
moreover, that A(x) is nonsingular near a solution p of F(x) = 0, and let bj(x) denote the 
entry of A(x)~! in the ith row and jth column. 

For G(x) = x — A(x)! F(x), we have gi(X) =X — Yi! bij (x) fj (x). So 


n 


_ of Obi. sip 
1 Y (mor (x) + = 500)), ifi=k, 


08;i j=l 
@=) 
OX Of; Obj oo 
-)- bj F OO + 7 WACO), fi zk 
j=l 


Theorem 10.7 implies that we need 0g;(p)/dx, = 0, for each i = 1,2,...,n and 
k = 1,2,...,n. This means that for i = k, 


n a 
0=1-)> bi p) (0). 
j=l ‘ 
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The Jacobian matrix first 
appeared in a 1815 paper by 
Cauchy, but Jacobi wrote De 
determinantibus functionalibus in 
1841 and proved numerous 
results about this matrix. 


Numerical Solutions of Nonlinear Systems of Equations 


that is, 
n a : 
i) =1. (10.6) 
j=l ‘ 
When k + i, 
n a 
O=- Y y0 cm, 
j=l 
SO 
n a 7 
» bp) p) =0. (10.7) 
j=l 
The Jacobian Matrix 
Define the matrix J(x) by 
Of oft afi 
ea ag ae ax, © 
af Op Op 
jel te ae, (10.8) 
Ofn Ofn Ofn 
ae (x) ax Gh). ea ax, (x) 


Then conditions (10.6) and (10.7) require that 
A(p) J (p) = /, the identity matrix, so A(p) = J(p). 


An appropriate choice for A(x) is, consequently, A(x) = J(x) since this satisfies condition 
(iii) in Theorem 10.7. The function G is defined by 


G(x) = x — J(x)'F(x), 
and the functional iteration procedure evolves from selecting x and generating, fork > 1, 
x = G(xEY) = xO) — F(x YD) TREY), (10.9) 


This is called Newton’s method for nonlinear systems, and it is generally expected 
to give quadratic convergence, provided that a sufficiently accurate starting value is known 
and that J(p)~! exists. The matrix J (x) is called the Jacobian matrix and has a number of 
applications in analysis. It might, in particular, be familiar to the reader due to its application 
in the multiple integration of a function of several variables over a region that requires a 
change of variables to be performed. 

A weakness in Newton’s method arises from the need to compute and invert the matrix 
J(x) at each step. In practice, explicit computation of J(x)~! is avoided by performing 
the operation in a two-step manner. First, a vector y is found that satisfies J(x*—))y = 
—F(x“—)), Then the new approximation, x, is obtained by adding y to x*~. Algorithm 
10.1 uses this two-step procedure. 
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Newton’s Method for Systems 


To approximate the solution of the nonlinear system F(x) = 0 given an initial approxima- 
tion x: 


INPUT number 7 of equations and unknowns; initial approximation x = (x),...,%n)', 
tolerance TOL; maximum number of iterations NV. 

OUTPUT approximate solution x = (x,,...,X,)‘ or a message that the number of 
iterations was exceeded. 


Step 7 Setk=1. 

Step 2 While (k < N) do Steps 3-7. 
Step 3 Calculate F(x) and J(x), where J(x);; = (Af;(x)/0x;) for 1 <i,j <n. 
Step 4 Solve the n x n linear system J(x)y = —F(x). 
Step5 Setx=x+y. 


Step 6 If ||y|| < TOL then OUTPUT (x); 
(The procedure was successful.) 
STOP. 


Step7 Setk=k+1. 


Step 8 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. = 


Example 1 The nonlinear system 
1 
3x1; — COS(X2.x3) — 5 = 0, 


x? — 81 (x. + 0.1)? + sinx3 + 1.06 = 0, 


1 a 
etl + 20x3 + “= = 0) 


was shown in Example 2 of Section 10.1 to have the approximate solution (0.5, 0, —0.52359877)'. 
Apply Newton’s method to this problem with x© = (0.1, 0.1, —0.1)’. 


Solution Define 
F(x), 2,%3) = (fi (%1,%2,%3), fo(01, 2,43), f3 (X15 %2,%3))', 
where 
1 
Ai 1, X2,.%3) = 3x, — cos(x2x3) — > 
fr(%1,%2,%3) = x7 — 81(x) + 0.1)? + sinx3 + 1.06, 
and 


10x —3 


31, 42,43) = @ *? + 2023 + 7 
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The Jacobian matrix J(x) for this system is 


X3 sin XIX3 X2 sin X2X3 

3 

J (x1, %2,%3) = 2x] —162(4% + 0.1) COS x3 
—x,e “12 —xje "2 20 


Let x = (0.1,0.1,—0.1)'. Then F(x) = (—0.199995, —2.269833417, 8.462025346)' 


and 
3 9.999833334 x 10-4 9,999833334 x 10-4 
ix) = 0.2 —32.4 0.9950041653 
—0.09900498337  —0.09900498337 20 


Solving the linear system, J(x)y© = —F(x) gives 


0.3998696728 0.4998696782 
y =| —0.08053315147 and x) =x +4y — | 0.01946684853 
—0.4215204718 —0.5215204718 


Continuing for k = 2,3,..., we have 


(k) (k-1) (k-1) 
ay al 1 
(kK) | (k—1) (k-1) 
x =/x, + | yy > 
(k) (k-1) (k-1) 
*3 X3 3 
where 
(k—-1) 
yy 4 
ye _—— (J a) F Ges) 
(k—-1) 
¥3 
Thus, at the kth step, the linear system J (x“~) y*-) = —F(x“—)) must be solved, 
where 
3 2S) sim xP LD gin DED 
I(x) = ant) —162 (xf? +0.1) cos xf) 
k-1) (k-1 k—1) (k-1) 
axk Dens a axt De xy 20 
k-1 
a 
ye) ye 
(k—-1) 
¥3 
and 
an - cox, ae - 5 
2 2 
F (x4) = | (xf?) — 81 (x? 40.1) +sinx$? + 1.06 
(k-1) (k-1) 
xy xf +205"? +4 10n—3 
The results using this iterative procedure are shown in Table 10.3. a 
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Table 10.3 


10.2 Newton's Method 


(k) 
xy 


(k) 
x4 


& 
%3 


x — xP I 


0.1000000000 
0.4998696728 
0.5000142403 
0.5000000113 
0.5000000000 


0.1000000000 
0.0194668485 
0.0015885914 
0.0000124448 
8.516 x 10-° 


—0.1000000000 


—0.5215204718 
—0.5235569638 


—0.5235984500 


—0.5235987755 


0.4215204718 
1.788 x 1072 
1.576 x 1073 
1.244 x 1075 


643 


nABWNrR OC] 


0.5000000000 —1.375 x 1071! —0.5235987756 8.654 x 107!° 


The previous example illustrates that Newton’s method can converge very rapidly once 
a good approximation is obtained that is near the true solution. However, it is not always easy 
to determine good starting values, and the method is comparatively expensive to employ. In 
the next section, we consider a method for overcoming the latter weakness. Good starting 
values can usually be found using the Steepest Descent method, which will be discussed in 
Section 10.4. 


Using Maple for Initial Approximations 


The graphing facilities of Maple can assist in finding initial approximations to the solutions 


of 2 x 2 and often 3 x 3 nonlinear systems. For example, the nonlinear system 
x? — x2 + 2x. = 0, 2x, +2-6=0 


has two solutions, (0.625204094, 2.179355825)! and (2.109511920, —1.334532188)'. To 
use Maple we first define the two equations 


eq := x1? — x2? + 2x2 = 0; eq2 := 2x1 + x2? -6=0; 
To obtain a graph of the two equations for —3 < x1,x2 < 3, enter the commands 


with(plots): implicitplot({eq1, eq2},x1 = —6..6, x2 = —6..6); 


From the graph shown in Figure 10.2, we are able to estimate that there are solutions near 
(2.1, —1.3)’, (0.64, 2.2)’, (—1.9, 3.0)’, and (—5.0, —4.0)’. This gives us good starting values 
for Newton’s method. 


Figure 10.2 
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The problem is more difficult in three dimensions. Consider the nonlinear system 
2x, —3%2 +43 -4=0, 2x, +%—-—2%3+4=0, w+ 24+x—-4=0. 


Define three equations using the Maple commands 
eql := 2x1—3x2-+x3—4 = 0; eq2 := 2x1+x2—x3+4 = 0;eq3 := x1?+x2?+x3?—4 = 0; 


The third equation describes a sphere of radius 2 and center (0, 0,0), so x1, x2, and x3 are 
in [—2, 2]. The Maple commands to obtain the graph in this case are 


with(plots): implicitplot3d ({eq1, eq2, eq3}, x1 = —2..2, x2 = —2..2, x3 = —2..2); 


Various three-dimensional plotting options are available in Maple for isolating a solu- 
tion to the nonlinear system. For example, we can rotate the graph to better view the sections 
of the surfaces. Then we can zoom into regions where the intersections lie and alter the 
display form of the axes for a more accurate view of the intersection’s coordinates. For this 
problem, a reasonable initial approximation is (x), x2,.x3)' = (—0.5, —1.5, 1.5)’. 


EXERCISE SET 10.2 


1. Use Newton’s method with x© = 0 to compute x® for each of the following nonlinear 
systems. 
a. Ax? — 20x; + 3 +8=0, b. sin(47x1x2) — 2x. — x, = 0, 
1 PE PA ate 2 Go fh 
5x + 2m —5x.+8=0. Ao 2 d , 
Cc. x;(1 — x}) + 4x2 = 12, d. 5x7 —x3 = 0, 
(x; — 2)? + (2x) — 3) = 25. xy — 0.25(sin x; + cos.x») = 0. 
2. Use Newton’s method with x® = 0 to compute x® for each of the following nonlinear 
systems. 
1 
a. 3x, — cos(x2x3) — a 0, b. xt +x —37=0, 
es 
4x? — 625x3 + 2x, -1=0, ag = = 
107 —3 x) t+%2+23-3=0. 
e7*12 4 20x3 + = ao 0 
ce 15x, + x3 — 4x3 = 13, d. 10x, — 2x5 +x) —2x,-5=0, 
xi + 10x. — x3 = 11, 8x3 + 4x3 -9 = 0, 
x3 — 25x; = —22. 8x2x3 + 4 = 0. 
3. Use the graphing facilities of Maple to approximate solutions to the following nonlinear 
systems. 
1 : 
a. 4x7 — 20x) + 72 +8=0, b. sin(4zrx\x2) — 2x, — x1 = 0, 
1 4n -1 se dee 2 0 
sti} +2n — 5 +8=0. op a eee 
c. x;(1 — x1) + 4x2 = 12, d. 5x7 — x3 = 0, 
(x, — 2)? + (2x2 — 3)? = 25. X_ — 0.25(sin x; + cos x) = 0. 


4. Use the graphing facilities of Maple to approximate solutions to the following nonlinear systems 
within the given limits. 
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1 
a. 3m — cos(%a%s) — 5 = 0, b. xi +x —37=0, 
a2 = = 
Ax} — 625x3; + 2x. -1=0, se a 
107 — 3 xX +x +%3-3=0. 
ee Ue —4<x, <8,-2<m% <2,-6<%3<0 
-l<x,<1,-l<m<1,-Il<»%<1 
ec 15x; +x5 — 4x3 = 13, d. 10x; — 2x5 +x —2x%3-5=0, 
xi + 10x. — x3 = 11, 8x3 + 4x3 —-9 = 0, 
x3 — 25x3 = —22. 8x243 +4=0. 
O<x, <2,0<% <2,0<%3 <2 O<x, < 2,-2 <x» <0,0<x3<2 


and O<x, <2,0<x% <2,-2 <x <0 


5. Use the answers obtained in Exercise 3 as initial approximations to Newton’s method. Iterate until 
|x® — xP] < 10°. 

6. Use the answers obtained in Exercise 4 as initial approximations to Newton’s method. Iterate until 
[x —x@-B||  < 10-6. 

7. Use Newton’s method to find a solution to the following nonlinear systems in the given domain. Iterate 
until |Jx® —x® |]. < 107%. 


a. 3x7 — x5 = 0, b. In (xj +.x3) — sinQyx2) = In2 + Inz, 
3x,x3 — x7 —1=0. e-® + cos(x,x2) = 0. 
Use x = (1, 1)’. Use x = (2,2). 
c. x} + x79x2 — 1143 +6 =0, d. 6x; — 2cos(%.x3) — 1 = 0, 
ee aa, Ox, + /x2 + sina; + 1.06 +0.9=0, 
% — 2x3 = 4. 60x3 + 3e-? + 107 — 3 = 0. 
Use x = (—1, —2, 1)’. Use x® = (0, 0, 0)’. 
8. The nonlinear system 
E, 24x; —x. +%3 = Xx)X4, Ey 2 —x, + 3x2 — 2x3 = x2X4, 
E3 2x, — 2x2 + 3x3 = x3%4, Eg 2x? +33 +33 = 1 


has six solutions. 
a. Show that if (x), x2,.3,%4)' is a solution then (—x,, —x2, —x3, x4)‘ is a solution. 


b. | Use Newton’s method three times to approximate all solutions. Iterate until | x® — x@-D | re 10-°. 


9. The nonlinear system 
1 
3x1 — COS(X2x3) — 3 = 0, 


1 
xi — 625x5 — — = 0, 
7 4 


107 —3 _ 
3 = 


el + 20x3 + 0) 


has a singular Jacobian matrix at the solution. Apply Newton’s method with x = (1, 1 — 1)’. Note 
that convergence may be slow or may not occur within a reasonable number of iterations. 


10. | What does Newton’s method reduce to for the linear system Ax = b given by 
AyjX1 + AyaX_ + +++ + AinXn = Dy, 


Aa) X1 + Aa2X2. + +++ + AonXn = bo, 


Gn] F Qn2X2 a a AnnXn = Dn, 


where A is a nonsingular matrix? 
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11. 
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14. 


Numerical Solutions of Nonlinear Systems of Equations 


Show that when n = 1, Newton’s method given by Eq. (10.9) reduces to the familiar Newton’s method 
given by in Section 2.3. 

The amount of pressure required to sink a large, heavy object in a soft homogeneous soil that lies 
above a hard base soil can be predicted by the amount of pressure required to sink smaller objects 
in the same soil. Specifically, the amount of pressure p required to sink a circular plate of radius r a 
distance d in the soft soil, where the hard base soil lies a distance D > d below the surface, can be 
approximated by an equation of the form 


p=ke’ + ker, 


where ky, ky, and k3 are constants, with ky > 0, depending on d and the consistency of the soil but not 
on the radius of the plate. (See [Bek], pp. 89-94.) 


a. Find the values of k,, ko, and k3 if we assume that a plate of radius 1 in. requires a pressure of 10 
Ib/in.’ to sink 1 ft in a muddy field, a plate of radius 2 in. requires a pressure of 12 lb/in.’ to sink 
1 ft, and a plate of radius 3 in. requires a pressure of 15 lb/in.” to sink this distance (assuming 
that the mud is more than | ft deep). 

b. Use your calculations from part (a) to predict the minimal size of circular plate that would be 
required to sustain a load of 500 lb on this field with sinkage of less than | ft. 


In calculating the shape of a gravity-flow discharge chute that will minimize transit time of discharged 
granular particles, C. Chiarella, W. Charlton, and A. W. Roberts [CCR] solve the following equations 
by Newton’s method: 


. sin On41 sin 6, 
@  frl@,..-, On) = (1 = wns) (1 — pw,) = 0, for eachn = 1,2,...,N —1. 


Un+1 n 


(ii) fyv(@1,.--,Ov) = Ay ©, tan; — X = 0, where 


1 
a v2 =a t 2gnAy— 2uAy Vi ere for eachn = 1,2,...,N, and 


86 
——., foreachn = 1,2,...,N. 


The constant vo is the initial velocity of the granular material, X is the x-coordinate of the end of 
the chute, jz is the friction force, N is the number of chute segments, and g = 32.17ft/s” is the 
gravitational constant. The variable 6; is the angle of the ith chute segment from the vertical, as shown 
in the following figure, and v; is the particle velocity in the ith chute segment. Solve (i) and (ii) for 
6 = (,...,Oy)' with w = 0, X = 2, Ay = 0.2, N = 20, and vp = 0, where the values for v, and 
w, can be obtained directly from (a) and (b). Iterate until a —a® YI, < 1077. 


(0, 0) 


«%V 


Ay 


An interesting biological experiment (see [Schr2]) concerns the determination of the maximum water 
temperature, Xj, at which various species of hydra can survive without shortened life expectancy. 
One approach to the solution of this problem uses a weighted least squares fit of the form f(x) = y = 
a/(x — b)* to accollection of experimental data. The x-values of the data refer to water temperature. 
The constant b is the asymptote of the graph of f and as such is an approximation to Xj. 
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a. Show that choosing a, b, and c to minimize 


n 2: 
a 
2 E& = mone _ =| 


i=1 


reduces to solving the nonlinear system 


WiVi 
_ LG - oe) d (x; — a 
7 _ wy l “wii ' 1 
0= ss : ; 
: (x; — b)¢ > (x; — b)2e+! > (x; — b)¢t! » (x; — b)2 


n 


= Wii “ In(x; — b) “.wiyiln@—b) < 1 
= (x; — b)° pe Gao LG GF 


i=l i=1 


b. Solve the nonlinear system for the species with the following data. Use the weights w; = Iny;. 


i | 1 | 2 | 3 | 4 
y, | 240] 3.80 | 4.75 | 21.60 
x, | 318 | 31.5 | 312 | 302 


| aS 10.3  Quasi-Newton Methods 


A significant weakness of Newton’s method for solving systems of nonlinear equations 
is the need, at each iteration, to determine a Jacobian matrix and solve an n x n linear 
system that involves this matrix. Consider the amount of computation associated with one 
iteration of Newton’s method. The Jacobian matrix associated with a system of n non- 
linear equations written in the form F(x) = 0 requires that the n* partial derivatives of 
the n component functions of F be determined and evaluated. In most situations, the ex- 
act evaluation of the partial derivatives is inconvenient, although the problem has been 
made more tractable with the widespread use of symbolic computation systems, such as 
Maple. 

When the exact evaluation is not practical, we can use finite difference approximations 
to the partial derivatives. For example, 


fx + exh) — fi) 
, 


ot 


aa ®) wx (10.10) 
where / is small in absolute value and e; is the vector whose only nonzero entry is a | 
in the kth coordinate. This approximation, however, still requires that at least n? scalar 
functional evaluations be performed to approximate the Jacobian and does not decrease the 
amount of calculation, in general O(n*), required for solving the linear system involving 
this approximate Jacobian. 

The total computational effort for just one iteration of Newton’s method is consequently 
at least n* + n scalar functional evaluations (n for the evaluation of the Jacobian matrix 
and n for the evaluation of F) together with O(n*) arithmetic operations to solve the linear 
system. This amount of computational effort is extensive, except for relatively small values 
of n and easily evaluated scalar functions. 
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CHAPTER 


10 


Numerical Solutions of Nonlinear Systems of Equations 


In this section we consider a generalization of the Secant method to systems of nonlin- 
ear equations, a technique known as Broyden’s method (see [Broy]). The method requires 
only n scalar functional evaluations per iteration and also reduces the number of arithmetic 
calculations to O(n”). It belongs to a class of methods known as least-change secant up- 
dates that produce algorithms called quasi-Newton. These methods replace the Jacobian 
matrix in Newton’s method with an approximation matrix that is easily updated at each 
iteration. 

The disadvantage of the quasi-Newton methods is that the quadratic convergence of 
Newton’s method is lost, being replaced, in general, by a convergence called superlinear. 
This implies that 


an Xt Pl 

“TaA = 
mm [xO =p 
where p denotes the solution to F(x) = 0 and x” and x“t!) are consecutive approximations 
to p. 

In most applications, the reduction to superlinear convergence is a more than acceptable 
trade-off for the decrease in the amount of computation. An additional disadvantage of quasi- 
Newton methods is that, unlike Newton’s method, they are not self-correcting. Newton’s 
method will generally correct for roundoff error with successive iterations, but unless special 
safeguards are incorporated, Broyden’s method will not. 

To describe Broyden’s method, suppose that an initial approximation x is given to 
the solution p of F(x) = 0. We calculate the next approximation x) in the same manner as 
Newton’s method. If it is inconvenient to determine J (x) exactly, we use the difference 
equations given by (10.10) to approximate the partial derivatives. To compute x’, however, 
we depart from Newton’s method and examine the Secant method for a single nonlinear 
equation. The Secant method uses the approximation 


yw fen = Feo) 


x1 — X0 


es 


as a replacement for f’ (x1) in the single-variable Newton’s method. 

For nonlinear systems, x“) — x is a vector, so the corresponding quotient is undefined. 
However, the method proceeds similarly in that we replace the matrix J (x) in Newton’s 
method for systems by a matrix A; with the property that 


Ay (x) — x) = F(x) — F(x), (10.11) 


Any nonzero vector in R” can be written as the sum of a multiple of x) — x anda 
multiple of a vector in the orthogonal complement of x“ — x. So, to uniquely define the 
matrix A,, we also need to specify how it acts on the orthogonal complement of x — x, 
No information is available about the change in F in a direction orthogonal to x“) — x, 
so we specify that no change be made in this direction, that is, 


A\Z= J(x)z, whenever (x‘? — x)! z= 0. (10.12) 


Thus, any vector orthogonal to x“) — x is unaffected by the update from J(x), which 
was used to compute x", to A;, which is used in the determination of x. 
Conditions (10.11) and (10.12) uniquely define A; (see [DM]) as 


[F (x) — F(x) — F(x) (xD — x)] (x — x)! | 


xO =x, 


Ay = J (x) + 
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It is this matrix that is used in place of J (x) to determine x as 
2 1 tpiyett 
x = x) — ATIF(x®). 


Once x has been determined, the method is repeated to determine x®), using A, in place 
of Ag = J (x), and with x® and x") in place of x and x. 
In general, once x“ has been determined, x“*+" is computed by 


i — Aj-18; 
A; = Aj) + Eg (10.13) 
IIsill 
and 
x) = xO — ATIF (x), (10.14) 


where the notation y; = F(x) — F(x“) and s; = x — x“—) is introduced to simplify 
the equations. 

If the method was performed as outlined in Eqs. (10.13) and (10.14), the number 
of scalar functional evaluations would be reduced from n? + n to n (those required for 
evaluating F(x“)), but O(n*) calculations would still required to solve the associated n x n 
linear system (see Step 4 in Algorithm 10.1) 


Ajsiz1 = —F(x). (10.15) 
Employing the method in this form would not be justified because of the reduction to 
superlinear convergence from the quadratic convergence of Newton’s method. 
Sherman-Morrison Formula 
A considerable improvement can be incorporated, however, by employing a matrix inversion 


formula of Sherman and Morrison (see, for example, [DM], p. 55). 


Theorem 10.8 (Sherman-Morrison Formula) 


Suppose that A is a nonsingular matrix and that x and y are vectors with y‘A~'x 4 —1. 
Then A + xy’ is nonsingular and 


= Avlxy'A7! 
A+xy) =a - >" _, Hi 
( y’) 1+ yiA-!x 
The Sherman-Morrison formula permits AS to be computed directly from AS, , elim- 


inating the need for a matrix inversion with each iteration. 
Letting A = Aj_1, x = (y; — Aj-18:)/|Isil|3, and y = s;, in Eq. (10.13) gives 


;— Ais; ,\' 
Ay! = (4-14 27s!) 
lIsill> 


—1 [ YirAi-18i ot -1 
Ay) | 2378; Ai 
IIsill5 


= Ar 
= — a 
1+s'A7! ¢ +) 
Ilsil|5 
aa (Anyi 7 si) Siar 
~ “"i-1 


2 =1 2 
IIsillz + S;A;_,yi — IIsill3 
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so 


s; — A,! i siA7! 
Ay! =ar4! _1Yi) u a (10.16) 


7,—1 
S/Ai_1Yi 


This computation involves only matrix-vector multiplications at each step and therefore 
requires only O(n’) arithmetic calculations. The calculation of A; is bypassed, as is the 
necessity of solving the linear system (10.15). 

Algorithm 10.2 follows directly from this construction, incorporating (10.16) into the 
iterative technique (10.14). 


Broyden 


To approximate the solution of the nonlinear system F(x) = 0 given an initial approxima- 
tion x: 


INPUT number 7 of equations and unknowns; initial approximation x = (x1,...,%n)’; 
tolerance TOL; maximum number of iterations NV. 


OUTPUT approximate solution x = (x1,...,X,)‘ or a message that the number of 
iterations was exceeded. 


Step 1 Set Ao = J(x) where J(x);; = 3A (x) forl <i,j <n; 
v=F(x). (Note: v= F(x®).) 
Step2 SetA=A) "(Use Gaussian elimination.) 


Step 3 Sets=-—Av; (Note:s=s}.) 
x=x+s; (Note:x =x ).) 
k= 2: 


Step 4 While (k < N) do Steps 5-13. 


Step5 Setw=v; (Save v.) 
v= F(x); (Note: v= F(x” ).) 
y=v-w. (Note: y = yx.) 
Step 6 Setz=—Ay. (Note: z= —A;',y;.) 


Step 7 Setp=-—s'z. (Note: p =s\A;',yx.) 
Step 8 Setu' =s/A. 

Step9 SetA=A++(s+z)u'. (Note: A =A;'.) 
Step 10 Sets=—Av. (Note: s = —A,'F(x).) 
Step 11 Setx=x+s. (Note:x = xt) 


Step 12 If ||s|| < TOL then OUTPUT (x); 
(The procedure was successful.) 
STOP. 

Step 13) Setk=k+1. 


Step 14 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. a 
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Example 1 Use Broyden’s method with x® = (0.1,0.1,—0.1)! to approximate the solution to the 
nonlinear system 


1 
3x1 — cos(x2x3) — 5 = 0, 


x7 — 81 (x2 +.0.1)? + sinx3 + 1.06 = 0, 


10x — 3 


e*2 4 20x3 + = 0. 


Solution This system was solved by Newton’s method in Example 1 of Section 10.2. The 
Jacobian matrix for this system is 


3 X3 sin XIX3 X2 sin X2X3 
J (x1, 2,3) = 2x1 —162(x2 + 0.1) COS X3 
—x,e “12 —xje “12 20 


Let x = (0.1,0.1, —0.1)! and 


F (x1, 2,3) = (fi0%1,%2,%3), fo (41, *2,%3), £301, X2,.%3))', 


where 
1 
Si G1, %2,%3) = 3x1 — cos(x2x3) — oe 
fr(%1,%2,%3) = x7 — 81(x) + 0.1)? + sinx3 + 1.06, 
and 
10z — 3 
#3041, X2,.%3) = EM? + 203 + a 
Then 
—1.199950 
F(x) = | —2.269833 
8.462025 
Because 
Ay = (x0 20,2) 
3 9.999833 x 10-* —9,999833 x 107+ 
= 0.2 —32.4 0.9950042 ‘ 
—9,900498 x 1072. —9.900498 x 1072 20 
we have 
A IGP ae ey 
0.3333332 1.023852 x 107° —_- 1.615701 x 107> 
2.108607 x 10-3 —3.086883 x 1072 1.535836 x 1073 
1.660520 x 1073 —1.527577 x 107* 5.000768 x 1072 
So 
0.4998697 
x) = x — AO'F(x®) = | 1.946685 x 10-7 |, 
—0.5215205 


—3.394465 x 10-4 
F(x) =|  —0.3443879 |, 
3.188238 x 10-? 
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1.199611 
y: = F(x?) —F(x®) = | 1.925445 ], 
—8.430143 
0.3998697 
s; = | —8.053315 x 10°? |, 
—0.4215204 


s\Ao!y, = 0.3424604, 
Az! = Ap! + (1/0.3424604) [(s, — Ag 'y1) S457] 


0.3333781 1.11050 x 10-5 —- 8.967344 x 107° 
= | —2.021270 x 1073 —3.094849 x 1072. 2.196906 x 1073 } , 
1.022214 x 107-3 +—1.650709 x 10-* 5.010986 x 1072 


and 
0.4999863 


x9) = x — AT'F(x®) = | 8.737833 x 10-3 
—0.5231746 


Additional iterations are listed in Table 10.4. The fifth iteration of Broyden’s method is 
slightly less accurate than was the fourth iteration of Newton’s method given in the example 


at the end of the preceding section. a 
Table 10.4 , x eo x Jx® — xD], 

3 0.5000066 8.672157 x 10-4 —0.5236918 7.88 x 10-3 

4 0.5000003 6.083352 x 10-5 —0.5235954 8.12 x 10-+ 

2) 0.5000000 — 1.448889 x 10-6 —0.5235989 6.24 x 10>> 

6 0.5000000 6.059030 x 107° —0.5235988 1.50 x 10-6 


Procedures are also available that maintain quadratic convergence but significantly 
reduce the number of required functional evaluations. Methods of this type were originally 
proposed by Brown [Brow,K]. A survey and comparison of some commonly used methods 
of this type can be found in [MC]. In general, however, these methods are much more 
difficult to implement efficiently than Broyden’s method. 


EXERCISE SET 103 


1. Use Broyden’s method with x = 0 to compute x for each of the following nonlinear systems. 


a. 4x7 — 20x, + 4+8-0, b. sin(4mx)x2) — 2x). — x, = 0, 
Ls 4x —1 Oey 2 _ 
rae) + 2x, —5x.+8=0. ( 7 ) (e e) + 4ex; — 2ex; = 0. 

c. 3x7 — x5 = 0, d. In(@a? +.x3) — sin@iim) = In24+Inz, 
3xyx3 —x} —1=0. et? + cos(x1x2) = 0. 

Use x = (1, DY. Use x = (2,2). 
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Use Broyden’s method with x® = 0 to compute x for each of the following nonlinear systems. 


1 
a. 3x; — cos(xa%3) — 5 = 0, b. xj +x. —37=0, 
2 = 
4x? — 625x3 + 2x, — 1 =0, lei alas 
10x —3 Xp +%.+%x3-3=0. 
e7*2 + 20x3 + aa ae 0. 
Cc. Xp + x72 — X43 + O = 0, d. 6x; — 2.cos(x2x3) — 1 = 0, 
et +e2 —x3=0, 9x. + /x7 + sinx3 + 1.06 + 0.9 = 0, 
x3 — 2xyx3 = 4. 60x3 + 3e*"2 + 1007 —3 = 0. 
Use x® = (—1,—2, 1)’. Use x = (0,0,0)'. 
Use Broyden’s method to approximate solutions to the nonlinear systems in Exercise | using the 
following initial approximations x. 
a. (0,0)! b. (0,0)! e (1,1) d. (2,2)! 
Use Broyden’s method to approximate solutions to the nonlinear systems in Exercise 2 using the 


_s 


ollowing initial approximations x. 


a. (1,1, 1)’ b. (2, 1,-1)' e« (—1,-2,1)' d. (0,0, 0)! 
Use Broyden’s method to approximate solutions to the following nonlinear systems. Iterate until 
|x® — xP] < 10°. 
a. x1(1 — x;) + 4x = 12, b. 5x7 — x3 = 0, 

(x; — 2)? + (2x, — 3)? = 25. X2 — 0.25(sin x; + cos x2) = 0. 
ce =615x, +x3 — 4x3 = 13, d. 10x; — 2x5 +x —2x3-5=0, 

xi + 10x. — x3 = 11, 8x5 + 4x3 —-9 = 0, 

ne — 25x3 = —22. 8x2x3 +4 = 0. 


The nonlinear system 
4x, — Xo +.X3 = X1X4, 
—x, + 3x2 — 2x3 = x2X4, 
Xy — 2X + 3x3 = X3X4, 
xe + 5 + xe =] 


has six solutions. 

a. Show that if (x1, x2,x3,x4)' is a solution then (—x,, —x2, —x3,.x4)' is a solution. 

b. Use Broyden’s method three times to approximate each solution. Iterate until 
|x®- x*-]|_. < 10>. 


The nonlinear system 


1 7 ¢. 1 ae 10z — 3 
3x, — COS(X2x3) — 3 =0, x; — 625x; — i =0, e + 20x3 + —— =0 


has a singular Jacobian matrix at the solution. Apply Broyden’s method with x = (1, 1 — 1)’. Note 
that convergence may be slow or may not occur within a reasonable number of iterations. 


Show that if 0 4 y € R" and z € R", then z = z + 2, where z; = (y’z/|ly||3)y is parallel to y and 
Z» is orthogonal to y. 


Show that if u, v € R”, then det + uv’) = 14+ v'u. 

a. Use the result in Exercise 9 to show that if A~! exists and x,y € R", then (A + xy’)~! exists if 
and only if y‘A~'x # —1. 

b. By multiplying on the right by A + xy’, show that when y‘A~!x 4 —1 we have 


A-!xy'A7! 


A+xy') | =A! - ——~_. 
ers 1+y’A-'x 
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11. Exercise 13 of Section 8.1 dealt with determining an exponential least squares relationship of the 
form R = bw to approximate a collection of data relating the weight and respiration rule of Modest 
sphinx moths. In that exercise, the problem was converted to a log-log relationship, and in part (c), 
a quadratic term was introduced in an attempt to improve the approximation. Instead of converting 
the problem, determine the constants a and b that minimize )~\_,(R; — bw*)° for the data listed in 
Exercise 13 of 8.1. Compute the error associated with this approximation, and compare this to the 
error of the previous approximations for this problem. 


a 10.4 Steepest Descent Techniques 


The advantage of the Newton and quasi-Newton methods for solving systems of nonlinear 
equations is their speed of convergence once a sufficiently accurate approximation is known. 
A weakness of these methods is that an accurate initial approximation to the solution is 
needed to ensure convergence. The Steepest Descent method considered in this section 
converges only linearly to the solution, but it will usually converge even for poor initial 
approximations. As a consequence, this method is used to find sufficiently accurate starting 
approximations for the Newton-based techniques in the same way the Bisection method is 
used for a single equation. 

The method of Steepest Descent determines a local minimum for a multivariable func- 
fies dinensipelacdieniionar tion of the form g : R” — R. The method is valuable quite apart from the application as a 
pointing in the steepest starting method for solving nonlinear systems. (Some other applications are considered in 
downward direction. the exercises.) 

The connection between the minimization of a function from R” to R and the solution 
of a system of nonlinear equations is due to the fact that a system of the form 


Si G1, %2,.--5Xn) = 0, 


S21, %2, ace <8 Xn) = 0, 


The name for the Steepest 
Descent method follows from the 


Sn 1, %2, oa Xn) = 0, 


has a solution at x = (x1,%2,...,Xy)' precisely when the function g defined by 


n 


8(X1, x2, ce Xn) S SLAG x, coe oy 


i=l 
has the minimal value 0. 


The method of Steepest Descent for finding a local minimum for an arbitrary function 
g from R” into R can be intuitively described as follows: 


t 
1. Evaluate g at an initial approximation x© = (ae —T 20) . 


2. Determine a direction from x that results in a decrease in the value of g. 
3. Move an appropriate amount in this direction and call the new value x“). 


4. Repeat steps 1 through 3 with x replaced by x“). 


The Gradient of a Function 


Before describing how to choose the correct direction and the appropriate distance to move 
in this direction, we need to review some results from calculus. The Extreme Value 1.9 
Theorem states that a differentiable single-variable function can have a relative minimum 
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The root gradient comes from the 
Latin word gradi, meaning “to 
walk”. In this sense, the gradient 
of a surface is the rate at which it 
“walks uphill”. 


Figure 10.3 
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only when the derivative is zero. To extend this result to multivariable functions, we need 
the following definition. 


For g : R” > R, the gradient of g at x = (x1,.x2,...,X,)' is denoted V g(x) and defined by 
a a a : 
Ve(x) = (=~), 8 ~,..., 2 @) . i 
Ox] 0X2 OXn 


The gradient for a multivariable function is analogous to the derivative of a single- 
variable function in the sense that a differentiable multivariable function can have a relative 
minimum at x only when the gradient at x is the zero vector. The gradient has another 
important property connected with the minimization of multivariable functions. Suppose 
that Vv = (vj, v2,..., Un)! is a unit vector in R”; that is, 


n 
2 2 
live = So? = 1. 
i=1 


The directional derivative of g at x in the direction of v measures the change in 
the value of the function g relative to the change in the variable in the direction of v. It is 
defined by 


1 
Dyg(x) = lim “let + hy) — g(%)] = v'- Vg(x). 


When g is differentiable, the direction that produces the maximum value for the directional 
derivative occurs when v is chosen to be parallel to Vg(x), provided that Vg(x) 4 0. Asa 
consequence, the direction of greatest decrease in the value of g at x is the direction given 
by —Vg(x). Figure 10.3 is an illustration when g is a function of two variables. 


Z = Q(X, Xp) 


(X1, Xo, ZO, X2)) 


The object is to reduce g(x) to its minimal value of zero, so an appropriate choice for 
x) is to move away from x in the direction that gives the greatest decrease in the value 
of g(x). Hence we let 


x) = xO _ aVg(x), for some constant a > 0. (10.17) 
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CHAPTER 10 o 


Example 1 


Numerical Solutions of Nonlinear Systems of Equations 


The problem now reduces to choosing an appropriate value of @ so that ga) will be 
significantly less than g(x). 

To determine an appropriate choice for the value a, we consider the single-variable 
function 


h(a) = g(x — aVe(x)). (10.18) 


The value of aw that minimizes h is the value needed for Eq. (10.17). 

Finding a minimal value for h directly would require differentiating / and then solving 
a root-finding problem to determine the critical points of h. This procedure is generally too 
costly. Instead, we choose three numbers a; < a@2 < a3 that, we hope, are close to where 
the minimum value of h(a) occurs. We then construct the quadratic polynomial P(x) that 
interpolates h at a), @2, and a3. The minimum of a quadratic polynomial is easily found in 
a manner similar to that used in Miiller’s method in section 2.6. 

We define & in [a,a3] so that P(@) is a minimum in [a;,a@3] and use P(@) to ap- 
proximate the minimal value of h(aw). Then @ is used to determine the new iterate for 
approximating the minimal value of g: 


x) = xO _ aVe(x). 


Because g(x) is available, to minimize the computation we first choose a; = 0 . Next a 
number a3 is found with h(a3) < h(a,). (Since a; does not minimize h, such a number a3 
does exist.) Finally, a2 is chosen to be a3/2. 

The minimum value of P on [@;, 3] occurs either at the only critical point of P or at 
the right endpoint a3 because, by assumption, P(a@3) = h(a3) < h(a;) = P(a;). Because 
P(x) is a quadratic polynomial, the critical point can be found by solving a linear equation. 


Use the Steepest Descent method with x = (0,0, 0)! to find a reasonable starting approx- 
imation to the solution of the nonlinear system 


1 
Si @1, X2,.%3) = 3x, — cos(x2.x3) — = 0, 
fo (01,2543) = x? — 81 (x2 + 0.1)? + sin x3 + 1.06 = 0, 
_ 10z — 3 
S31, X2,.%3) = @ “2 + 203 + 3 = 0. 
Solution Let g(x1,x2,x3) = [fi 1,%2,.43) 1? + for, 12,43)? + [fs 1, x2,.%3)]?. Then 
Vg(xX1,%2,%3) = Ve(x) = (20 «Aw + 2 frcx) (x) + 2 f3(x Ben, 
2fi os + 2 r(x) x) + 2I) fs ( (x), 
2fi ee) + 2 fra) (x) + ree) 


= 2J(x)'F(x). 
For x® = (0,0,0)', we have 
g(x) = 111.975 and zp = ||Vg (x) [lp = 419.554. 
Let 


1 
z= —Vg(x) = (—0.0214514, —0.0193062, 0.999583)’. 
20 
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With a = 0, we have g) = g(x — az) = g(x) = 111.975. We arbitrarily let a3 = 1 
so that 
B= as” = a32) = 93.5649. 
Because g3 < gi, we accept a3 and set a2 = a3/2 = 0.5. Thus 
g2 = g(x — pz) = 2.53557. 


We now find the quadratic polynomial that interpolates the data (0, 111.975), (1, 93.5649), 
and (0.5, 2.53557). It is most convenient to use Newton’s forward divided-difference inter- 
polating polynomial for this purpose, which has the form 


P(a)=githa+ha(a — az). 
This interpolates 
a(x _ aVe(x)) = e(x _ az) 
at a; = 0, a2 = 0.5, and a3 = 1 as follows: 


a, = 0, gi = 111.975, 
§2— 81 


= 05, 2259557, bs — —218.878, 
a2 —- Qa 
= hy —h 
ao 9295s, 422 = a1 050, ye 2 Saar. 
a3 — AQ a3 — ay 


Thus 
P(a) = 111.975 — 218.878a@ + 400.937a(a@ — 0.5). 


We have P’(a) = 0 when a = a = 0.522959. Since go = g(x — az) = 2.32762 is 
smaller than g; and g3, we set 


x) = x — aoz = x© — 0.522959z = (0.0112182, 0.0100964, —0.522741)! 
and 
a(x) = 2.32762, 


Table 10.5 contains the remainder of the results. A true solution to the nonlinear system 
is (0.5, 0, —0.5235988)', so x would likely be adequate as an initial approximation for 
Newton’s method or Broyden’s method. One of these quicker converging techniques would 
be appropriate at this stage, since 70 iterations of the Steepest Descent method are required 


to find |x — x||.o < 0.01. = 
Table 10.5 k x0 x x 2, x, x”) 
2 0.137860 —0.205453 —0.522059 1.27406 
3 0.266959 0.00551102 —0.558494 1.06813 
4 0.272734 —0.00811751 —0.522006 0.468309 
5 0.308689 —0.0204026 —0.533112 0.381087 
6 0.314308 —0.0147046 —0.520923 0.318837 
7 0.324267 —0.00852549 —0.528431 0.287024 
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Algorithm 10.3 applies the method of Steepest Descent to approximate the minimal 
value of g(x). To begin an iteration, the value 0 is assigned to a, and the value | is assigned 
to a3. If h(a3) > h(a,), then successive divisions of a3 by 2 are performed and the value 
of a3 is reassigned until h(a3) < h(a) and a3 = 2~* for some value of k. 

To employ the method to approximate the solution to the system 

fi 1, %2,---s%n) = 0, 


S21, %2,- af » Xn) = 0, 


Sn Q1, %2, aOR ee Xn) = 0, 


we simply replace the function g with )~_, f?. 


Steepest Descent 
To approximate a solution p to the minimization problem 
g(p) = min g(x) 
xeR” 
given an initial approximation x: 


INPUT number x of variables; initial approximation x = (x1,...,X,)‘; tolerance 
TOL; maximum number of iterations NV. 


OUTPUT approximate solution x = (x,...,x,)' or a message of failure. 
Step 7 Setk=1. 
Step 2. While (k < N) do Steps 3-15. 


Step 3 Set g1 = g(u1,...,%n); (Note: g1 = g (x) .) 
Z= Vg(x1,---,%n)3 (Note: z= Vg (x™).) 
zo = |l2/|2- 
Step 4 If z = 0 then OUTPUT (‘Zero gradient’); 
OUTPUT (41, ...,%n, 815 
(The procedure completed, might have a minimum.) 


STOP. 
Step 5 Setz=2/z; (Make za unit vector.) 
a= 0; 
a3 =1; 


83 = g(X— a3Z). 
Step 6 While (g3 > g,) do Steps 7 and 8. 


Step 7 Seta3 = a3/2; 
83 = 8(X — 032). 
Step 8 Ifa; < TOL/2 then 
OUTPUT (‘No likely improvement’); 
OUTPUT (x1,... Xn, 21)3 


(The procedure completed, might have a minimum.) 
STOP. 
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Step 9 Seta, =a;3/2; 
82 = g(X — QZ). 
Step 10 Set hy = (g2 — g1)/a2; 
hz = (83 — 82)/(@3 — a2); 
hg = (hg — hy) /a. 
(Note: Newton’s forward divided-difference formula is used to find 
the quadratic P(a) = g; + ha + h3a(a — a2) that interpolates 
h(a) at a = 0,a@ = Q2,a = 3.) 


Step 11. Set ap = 0.5(a@2 — hy/h3); (The critical point of P occurs at a.) 
80 = 8(K — Az). 

Step 12 Find a from {ao, a3} so that g = g(x — az) = min{go, g3}. 

Step 13. Setx =x— az. 


Step 14 If |g—g1| < TOL then 
OUTPUT (41,...,Xn, g); 
(The procedure was successful.) 
STOP. 


Step 15 Setk=k+1. 


Step 16 OUTPUT (‘Maximum iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. a 


There are many variations of the method of Steepest Descent, some of which involve 
more intricate methods for determining the value of a that will produce a minimum for 
the single-variable function h defined in Eq. (10.18). Other techniques use a multidimen- 
sional Taylor polynomial to replace the original multivariable function g and minimize the 
polynomial instead of g. Although there are advantages to some of these methods over the 
procedure discussed here, all the Steepest Descent methods are, in general, linearly conver- 
gent and converge independent of the starting approximation. In some instances, however, 
the methods may converge to something other than the absolute minimum of the function g. 

A more complete discussion of Steepest Descent methods can be found in [OR] 

or [RR]. 


EXERCISE SET 10.4 


1. Use the method of Steepest Descent with TOL = 0.05 to approximate the solutions of the following 
nonlinear systems. 


1 
a. 4x7 — 20x. + 7 +8=0, b. 3x] — x3 = 0, 
3xix3 —x} —1=0. 


1 
5x8 + 2x; —5%+8=0. 


ce. In@? +x3) — sin@iim) =In2+Inz, d. sin(47x1x2) — 2x. — x; = 0, 
e122 + cos(x,xX2) = 0. 4n —1 
4a 


) (ee! —e)+ dex} — 2ex,; = 0. 
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2. Use the method of Steepest Descent with TOL = 0.05 to approximate the solutions of the following 
nonlinear systems. 


a. 15x, +23 — 4x3 = 13, b. 10x, — 2x3 +. — 2x3 -5=0, 
x? + 10x. — x3 = 11, 8x3 + 4x3 -9 =0, 
% — 25x3 = —22. 8x2x3 +4 = 0. 
c. x} + x0%2 — 1143 +6 =0, d. x; + cos(x,%2x3) — 1 = 0, 
el + e2 — x; =0, (1 — x1)'/4 +x) + 0.05x3 — 0.15x3 — 1 = 0, 
x5 — 2x1x3 = 4. —xi _ 0.1x5 + 0.01lx2 +x3-1=0. 


3. Use the results in Exercise 1 and Newton’s method to approximate the solutions of the nonlinear 
systems in Exercise 1 to within 10~°. 


4. Use the results of Exercise 2 and Newton’s method to approximate the solutions of the nonlinear 
systems in Exercise 2 to within 10~-°. 


5. Use the method of Steepest Descent to approximate minima to within 0.005 for the following 
functions. 


a _g(X1,X2) = cos(xy + X2) + sin x; + COs x2 

8(%1,%2) = 1000p — x2)? + (I — 11)? 

B(X1,%2,X3) = xt + 2x5 +x%3 — 2x4X_ + 2x, — 2.5x2 — 43 +2 
G(X1,%2,%3) = XP + 2x} + 3x9 + 1.01 

Show that the quadratic polynomial 


PRo Ss 


P(@) =g1 +ha + hga(a — a) 
interpolates the function h defined in (10.18): 
h(a) = g(x — aVg(x)) 


at a = 0, a, and a3. 


b. Show that a critical point of P occurs at 


| 10.5 Homotopy and Continuation Methods 


Homotopy, or continuation, methods for nonlinear systems embed the problem to be solved 
within a collection of problems. Specifically, to solve a problem of the form 


F(x) = 0, 


which has the unknown solution x*, we consider a family of problems described using a 
parameter 4 that assumes values in [0, 1]. A problem with a known solution x(0) corresponds 
to the situation when 4 = O, and the problem with the unknown solution x(1) = x* 
corresponds to A = 1. 
For example, suppose x(0) is an initial approximation to the solution of F(x*) = 0. 
efine 


A homotopy is a continuous 
deformation; a function that takes 
a real interval continuously into a 
set of functions. 


G: [0,1] x R” — R’ 
by 
G(A, x) = AF(x) + (1 — A) [F(x) — F(x(0))] = F(x) + (A — 1)F(x(0)). (10.19) 
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We will determine, for various values of A, a solution to 
GQ, x) = 0. 
When A = 0, this equation assumes the form 
0 = G(0,x) = F(x) — F(x(0)), 
and x(0) is a solution. When 4 = 1, the equation assumes the form 
0 = G(1,x) = F(x), 


and x(1) = x* is a solution. 

The function G, with the parameter A, provides us with a family of functions that 
can lead from the known value x(0) to the solution x(1) = x*. The function G is called a 
homotopy between the function G(0, x) = F(x)—F(x(0)) and the function G(1, x) = F(x). 


Continuation 


The continuation problem is to: 


e Determine a way to proceed from the known solution x(0) of G(0, x) = 0 to the unknown 
solution x(1) = x* of G(1, x) = 0, that is, the solution to F(x) = 0. 


We first assume that x(A) is the unique solution to the equation 
G(A,x) = 0, (10.20) 


for each 4 € [0, 1]. The set {x(A) | O < A < 1} can be viewed as a curve in R” from x(0) 
to x(1) = x* parameterized by 4. A continuation method finds a sequence of steps along 
this curve corresponding to {x(A,)}/_9, where Ag =O < Ay < +++ <A, = 1. 

If the functions 2 — x(A) and G are differentiable, then differentiating Eq. (10.20) 
with respect to A gives 


dG(A,x(A)) AGA, x())_, 
= + xX 


Or ox 


(A), 


and solving for x’(A) gives 


x(a) = 


aG(A,x(A)) 1 | AGA, x(A)) 
ox ar 


This is a a system of differential equations with the initial condition x(0). 
Since 


GA, x(A)) = FX(A)) + A — DFO), 


we can determine both 


a a a 

“F x(ay) ohh gay) na BA gay) 

X] 0x2 OXn 

ofr ofr afr 

2S xa =| ame dmOEP de®? | = exer, 

0 n 0 n d n 

(x(A)) J (x()) ... th (x(A)) 

X41 0x2 OXn 
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the Jacobian matrix, and 


dG(A, x(A)) 


aA = F(x(0)). 


Therefore, the system of differential equations becomes 
x/(A) = —[J(x(A))] 'F(x(0)), for O<A<1, (10.21) 


with the initial condition x(0). The following theorem (see [OR], pp. 230-231) gives con- 
ditions under which the continuation method is feasible. 


Theorem 10.10 Let F(x) be continuously differentiable for x € IR”. Suppose that the Jacobian matrix J (x) is 
nonsingular for all x € R” and that a constant M exists with ||J(x)~'|| < M, for allx € R". 
Then, for any x(0) in R”, there exists a unique function x(A), such that 
GQ, x(A)) = 0, 


for all A in [0, 1]. Moreover, x(A) is continuously differentiable and 


x’(A) = —J(x(A))'F(x(0)), for each A € [0, 1]. | 


The following shows the form of the system of differential equations associated with 
a nonlinear system of equations. 


Illustration Consider the nonlinear system 


fi (x1, X2,X3) => 3x] = COS(X2X3) = 0.5 => 0, 


fr(%1,%2,%3) = x2 — 81(x, + 0.1)? + sinx3 + 1.06 = 0, 


aes 10z — 3 
#3(X1,%2,.%3) = EP? + 203 + a 0. 
The Jacobian matrix is 
3 X3 sin XIX3 X2 sin X2X3 
J(x) = 2x) —162(% + 0.1) COS X3 
—x,e7 712 —xje 72 20 
Let x(0) = (0,0, 0)’, so that 
-1.5 
F(x(0)) = 0.25 
102 /3 
The system of differential equations is 
x) (A) 3 X3 Sin x2x3 xy sinnyx3 | Pf —L5 
x(A) | =— 2x1 —162(x. + 0.1) COS X3 0.25 
x4 (A) —x2e7*12 —xje *12 20 102 /3 
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In general, the system of differential equations that we need to solve for our continuation 
problem has the form 


dx, 
dh 
dx 
dh 


= P(A, X1,%2,---5Xn)s 


= b2(A, x1, X2, tee Xn)s 


dXp 
dx 


= nla, x1, 2, siestns 
where 


Pi(A,X1,---5Xn) fi(x()) 
P2(A,X1,-- + sXn) f2(x()) 


= —J(x1,...,%)! (10.22) 
ona, X],--- Xn) Sn (x(0)) 


To use the Runge-Kutta method of order four to solve this system, we first choose an 


integer N > 0 and let h = (1 — 0)/N. Partition the interval [0, 1] into NV subintervals with 
the mesh points 


Aj =jh, foreach j =0,1,...,N. 


We use the notation w,, for each j = 0,1,...,N andi = 1,...,n, to denote an approxima- 
tion to x;(A;). For the initial conditions, set 


wi10 =x1(0), w29 =%2(0),  ..-, Wao = Xn (0). 


Suppose w 1), W2;, ..-, Wj have been computed. We obtain wi j41, W2j41,-- +5 Waj+l 
using the equations 


kya = hd(aj, Wij, W2j.+++> Wnj)s foreach i= 1,2, arneagulty 


h 
ky, = hi | Aj + 5 


1 1 
Fons + akideeese tay + Shi) for each i = 1,2,...,3 


2 2 


2 2 2 
kaj = hd, + h, Wij + kaa, Ww2j + k39, see Way + k3.n), for each i = 1,2, weg dls 


h 1 1 
k3; = hd; (ut puts 21ers Wag +> 2») for each i = 1728 gt 


and, finally 


1 
Wij+l = Wij + 6 (kui + 2k, + 2k3 + ka.) > for each i = 1, 2, sheraicg les 


The vector notation 


ki om ky kay wij 

ki ko 2 k39 kao w2 
k, = . »ko = . »k3 = : »ky= : » and w= 

Kin kan K3n Kan Why 
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Simplifies the presentation. Then Eq. (10.22) gives us x(0) = x(Ao) = Wo, and for each 
jJ=0,1,...,N, 


Pi (Aj, Wire +s Way) 
2(A;, W1jr+-+, Wn) — 

=k = h[-J(wij,--.,Unj)] | Fx) 
gn (Aj, Wijrree> Wn,j) 


h[—s(w)] | F(x); 


k, = h[-J (w; + ks) | F(x(0)); 


-1 
ee er, (w+ 5%) | F(x(0)); 


= 
- (w rn se) | F(x(0)); 


and 


1 1 
X(Aja1) = X(Aj) + 6 (k, + 2k) + 2k3 + ky) = wy + 6 (kK, + 2k + 2k3 + ky). 


Finally, x(A,,) = x(1) is our approximation to x*. 


Example 1 Use the Continuation method with x(0) = (0, 0,0)‘ to approximate the solution to 


Ai @1, X2,.%3) = 3x, — cos(x2x3) — 0.5 = 0, 
fo (X1,%0,%3) = x? — 81 (x2 + 0.1)? + sin x3 + 1.06 = 0, 
10x —3 _ 


A3(X1, %2,.%3) = ET? + 2043 + 3 0. 
Solution The Jacobian matrix is 
3 X3 sin XIX3 X2 sin X2X3 
J(x) = 2x1 —162(x% + 0.1) COS X3 
—x,e7 712 —xje "2 20 
and 
F(x(0)) = (—1.5, 0.25, 10/3)’. 
With N = 4 and h = 0.25, we have 
> 6 CTT =18 
k, = h[—J(x)]-“!F(x(0)) = 0.25} 0 -16.2 1 0.25 
0 @) 20 10x /3 


= (0.125, —0.004222203325, —0.1308996939)'; 
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k» = A[—J (0.0625, —0.002111101663, —0.06544984695)]~'(—1.5, 0.25, 102 /3)' 


3 —0.9043289149 x 10-5 —0.2916936196 x 10-6] [ -1.5 
= 0.25 0.125 —15.85800153 0.9978589232 0.25 
0.002111380229 —0.06250824706 20 102 /3 


= (0.1249999773, —0.003311761993, —0.1309232406)'; 

k3 = h[—J(0.06249998865, —0.001655880997, —0.0654616203)]~'(—1.5, 0.25, 107/3)’ 
= (0.1249999844, —0.003296244825, —0.130920346)’; 

ky = h[—J(0.1249999844, —0.003296244825, —0.130920346)]~'(—1.5, 0.25, 1077/3)! 
= (0.1249998945, —0.00230206762, —0.1309346977)'; 


and 
1 
x(A1) =Ww,; =Wot aus + 2k> + 2k; + k,) 
= (0.1249999697, —0.00329004743, —0.1309202608)’. 


Continuing, we have 


X(A2) = W2 = (0.2499997679, —0.004507400128, —0.2618557619)', 
X(A3) = w3 = (0.3749996956, —0.003430352103, —0.3927634423)', 


and 


X(Ag) = x(1) = wa = (0.4999999954, 0.126782 x 1077, —0.5235987758)'. 


These results are very accurate because the actual solution is (0.5, 0, —0.52359877)'. | 


Note that in the Runge-Kutta methods, the steps similar to 
kj = h[—J(x(Aj) + @-1kj-1) | Fx(0)) 
can be written as solving for k; in the linear system 
J (X(Aj) + aj-1kj-1) ki = —hF(x(0)). 


So in the Runge-Kutta method of order four, the calculation of each w; requires four linear 
systems to be solved, one each when computing k,, kz, k3, and ky. Thus using N steps 
requires solving 4N linear systems. By comparison, Newton’s method requires solving one 
linear system per iteration. Therefore, the work involved for the Runge-Kutta method is 
roughly equivalent to 4N iterations of Newton’s method. 

An alternative is to use a Runge-Kutta method of order two, such as the modified Euler 
method or even Euler’s method, to decrease the number of linear systems that need to be 
solved. Another possibility is to use smaller values of N. The following illustrates these 
ideas. 


Illustration Table 10.6 summarizes a comparison of Euler’s method, the Midpoint method, and the 
Runge-Kutta method of order four applied to the problem in the example, with initial 
approximation x(0) = (0,0,0)'. The right-hand column in the table lists the number of 
linear systems that are required for the solution. 
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Table 10.6 

Method N x(1) Systems 
Euler 1 (0.5, —0.0168888133, —0.5235987755)* 1 
Euler 4 (0.499999379, —0.004309 160698, —0.523679652)' 4 
Midpoint 1 (0.4999966628, —0.00040240435, —0.523815371)‘ 2 
Midpoint 4 (0.500000066, —0.00001760089, —0.5236127761)‘ 8 
Runge-Kutta 1 (0.4999989843, —0.1676151 x 107°, —0.5235989561)' 4 
Runge-Kutta 4 (0.4999999954, 0.126782 x 1077, —0.5235987758)' 16 


The continuation method can be used as a stand-alone method, and does not require a 
particularly good choice of x(0). However, the method can also be used to give an initial 
approximation for Newton’s or Broyden’s method. For example, the result obtained in 
Example 2 using Euler’s method and N = 2 might easily be sufficient to start the more 
efficient Newton’s or Broyden’s methods and be better for this purpose than the continuation 
methods, which require more calculation. 


Continuation Algorithm 


To approximate the solution of the nonlinear system F(x) = 0 given an initial approxima- 
tion x: 


INPUT number n of equations and unknowns; integer N > 0; initial approximation 
x= (x1, X2, Sas hn): 


OUTPUT approximate solution x = (x1, .x2,...,Xn)'. 


Step 1 Seth=1/N; 
b = —AF(x). 


Step 2 Fori=1,2,...,N do Steps 3-7. 


Step 3 SetA =J(x); 

Solve the linear system Ak, = b. 
Step 4 SetA =J(x+ 5ki); 

Solve the linear system Ak» = b. 
Step 5 SetA=J(x + 4k); 

Solve the linear system Ak3 = b. 
Step6 SetA=J(x+ks3); 

Solve the linear system Ak3 = b. 
Step 7 Setx =x+ (k,; + 2k) + 2k3 + ky)/6. 


Step 8 OUTPUT (%1,%2,...,%n); 
STOP. a 


EXERCISE SET 10.5 
1. The nonlinear system 
fi O1,%) = xP — x5 + 2x) = 0, folt1,%2) = 2x, +45 -6=0 


has two solutions, (0.625204094, 2.179355825)' and (2.109511920, —1.334532188)'. Use the 
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continuation method and Euler’s method with NV = 2 to approximate the solutions where 
a. x(0) = (0,0)' b. x(0) = (1, 1)’ ce x(0) = (3,—2)' 
Repeat Exercise 1 using the Runge-Kutta method of order four with N = 1. 


Use the continuation method and Euler’s method with N = 2 on the following nonlinear systems. 


a. Ax? — 20x, + ; Z +8=0, b. sin(47rx1\x2) — 2x. — x; = 0, 
1, 4x -1 ee a 
qiite + 2m — 3a +8 = 0. = (e™! — e) + 4exz — 2ex,; = 0. 

1 
c. 3x, — COS(X2x3) — a 0, d. Xt +x) — 37 =0, 


x, — x5 -5=0, 


XxX, +%.+23-3=0. 


4x? - 625x5 + 2x —1=0, 


102 — 3 
el) 4. 20x5 + = = 
4. Use the continuation method and the Runge-Kutta method of order four with N = 1 on the following 
nonlinear systems using x(0) = 0. Are the answers here comparable to Newton’s method or are they 
suitable initial approximations for Newton’s method? 


a. x,(1 — x1) + 4) = 12, b. 5x7 - Ee = 0; 
(x; — 2)? + (2x, — 3)? = 25. X2 — 0.25(sin x; + cos x2) = 0. 
Compare to 10.2(5c). Compare to 10.2(5d). 

ce 15x, +x3 — 4x3 = 13, d. 10x; — 2x5 +x —2x,-5=0, 

xi + 10x. — x3 = 11. 8x3 + 4x3 -9 = 0. 
x3 = 25x3 = —22 8x2.X3 +4=0 
Compare to 10.2(6c). Compare to 10.2(6d). 


5. Repeat Exercise 4 using the initial approximations obtained as follows. 
a. From 10.2(3c) b. From 10.2(3d) ec. From 10.2(4c) d. From 10.2(4d) 


6. Use the continuation method and the Runge-Kutta method of order four with VN = 1 on Exercise 7 of 
Section 10.2. Are the results as good as those obtained there? 


Repeat Exercise 5 using N = 2. 


Repeat Exercise 8 of Section 10.2 using the continuation method and the Runge-Kutta method of 
order four with N = 1. 


9. Repeat Exercise 9 of Section 10.2 using the continuation method and the Runge-Kutta method of 
order four with N = 2. 


10. Show that the continuation method and Euler’s method with N = 1 gives the same result as Newton’s 
method for the first iteration; that is, with x(0) = x we always obtain x(1) = x. 


11. Show that the homotopy 
G(A, x) = F(x) — e“*F(x(0)) 


used in the continuation method with Euler’s method and h = 1 also duplicates Newton’s method for 
any x); that is, with x(0) = x, we have x(1) = x. 


12. Let the continuation method with the Runge-Kutta method of order four be abbreviated CMRK4. 
After completing Exercises 4, 5, 6, 7, 8, and 9, answer the following questions. 


a. Is CMRK4 with N = 1 comparable to Newton’s method? Support your answer with the results 
of earlier exercises. 


b. Should CMRK4 with N = 1 be used as a means to obtain an initial approximation for Newton’s 
method? Support your answer with the results of earlier exercises. 


Repeat part (a) for CMRK4 with N = 2. 
d. Repeat part (b) for CMRK4 with N = 2. 
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CHAPTER 


10 


Numerical Solutions of Nonlinear Systems of Equations 


| Sa 10.6 Survey of Methods and Software 


In this chapter we considered methods to approximate solutions to nonlinear systems 
fi, %2,+++5Xn) = 0, 


S21, X25. --5Xn) = 0, 


Sn Q1,%2, te Xn) = 0. 
(0) 0) 


Newton’s method for systems requires a good initial approximation (x; gg ys bs x0)! 
and generates a sequence 


gh a gt T(x®?) F(R), 


that converges rapidly to a solution x if x is sufficiently close to p. However, Newton’s 
method requires evaluating, or approximating, n* partial derivatives and solving ann by n 
linear system at each step. Solving the linear system requires O(n’) computations. 

Broyden’s method reduces the amount of computation at each step without significantly 
degrading the speed of convergence. This technique replaces the Jacobian matrix J with a 
matrix Ay_; whose inverse is directly determined at each step. This reduces the arithmetic 
computations from O(n?) to O(n? . Moreover, the only scalar function evaluations required 
are in evaluating the f;, saving n° scalar function evaluations per step. Broyden’s method 
also requires a good initial approximation. 

The Steepest Descent method was presented as a way to obtain good initial approxi- 
mations for Newton’s and Broyden’s methods. Although Steepest Descent does not give a 
rapidly convergent sequence, it does not require a good initial approximation. The Steepest 
Descent method approximates a minimum of a multivariable function g. For our application 
we choose 

n 
8(%1,X2, vee Xn) = YL fiG, x, tee Seal. 
i=l 
The minimum value of g is 0, which occurs when the functions f; are simultaneously 0. 

Homotopy and continuation methods are also used for nonlinear systems and are the 

subject of current research (see [AG]). In these methods, a given problem 


F(x) =0 


is embedded in a one-parameter family of problems using a parameter A that assumes values 
in [0, 1]. The original problem corresponds to A = 1, and a problem with a known solution 
corresponds to 4 = 0. For example, the set of problems 


G(A, x) = AF(x) + (1 — A)(F(x) — F(xo)) = 0, forO <A <1, 


with fixed x9 € R” forms a homotopy. When A = 0, the solution is x(A = 0) = Xo. 
The solution to the original problem corresponds to x(A = 1). A continuation method 
attempts to determine x(A = 1) by solving the sequence of problems corresponding to 
Ap =O <A, < Ag <-++ <A, = 1. The initial approximation to the solution of 


AiF(x) + (1 — Ai) (F(x) — F(xo)) = 0 
would be the solution, x(A = A;_1), to the problem 


Ai-1F (x) + (1. — Ai-1) (F(X) — F(xo)) = 0. 
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The package Hompack in netlib solves a system of nonlinear equations by using various 
homotopy methods. 

The nonlinear systems methods in the IMSL and NAG libraries use the Levenberg- 
Marquardt method, which is a weighted average of Newton’s method and the Steepest 
Descent method. The weight is biased toward the Steepest Descent method until convergence 
is detected, at which time the weight is shifted toward the more rapidly convergent Newton’s 
method. In either routine a finite difference approximation to the Jacobian can be used or a 
user-supplied subroutine entered to compute the Jacobian. 

A comprehensive treatment of methods for solving nonlinear systems of equations 
can be found in Ortega and Rheinbolt [OR] and in Dennis and Schnabel [DenS]. Recent 
developments on iterative methods can be found in Argyros and Szidarovszky [AS], and 
information on the use of continuation methods is available in Allgower and Georg [AG]. 
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TER 


Boundary-Value Problems for Ordinary 
Differential Equations 


Introduction 


A common problem in civil engineering concerns the deflection of a beam of rectangular 
cross section subject to uniform loading while the ends of the beam are supported so that 
they undergo no deflection. 


a 


w(x) 


Suppose that /, q, E, S, and / represent, respectively, the length of the beam, the intensity 
of the uniform load, the modulus of elasticity, the stress at the endpoints, and the central 
moment of inertia. The differential equation approximating the physical situation is of the 
form 

dw 


S 
Sr) = we) + a ees 


2EI 
where w(x) is the deflection a distance x from the left end of the beam. Since no deflection 
occurs at the ends of the beam, there are two boundary conditions 


w(0)=0O and w(/)=0. 


When the beam is of uniform thickness, the product EJ is constant. In this case the 
exact solution is easily obtained. When the thickness is not uniform, the moment of inertia 
/ is a function of x, and approximation techniques are required. Problems of this type are 
considered in Exercises 7 of Section 11.3 and 6 of Section 11.4. 

The differential equations in Chapter 5 are of first order and have one initial condition 
to satisfy. Later in the chapter we saw that the techniques could be extended to systems of 
equations and then to higher-order equations, but all the specified conditions are on the same 
endpoint. These are initial-value problems. In this chapter we show how to approximate 
the solution to boundary-value problems, differential equations with conditions imposed 
at different points. For first-order differential equations, only one condition is specified, 
so there is no distinction between initial-value and boundary-value problems. We will be 
considering second-order equations with two boundary values. 

Physical problems that are position-dependent rather than time-dependent are often 
described in terms of differential equations with conditions imposed at more than one point. 
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The two-point boundary-value problems in this chapter involve a second-order differential 
equation of the form 


Y=f@yx), foresxc b, (11.1) 
together with the boundary conditions 


y(a)=a and y(b) =B. e113) 


| 11.1 The Linear Shooting Method 


The following theorem gives general conditions that ensure the solution to a second-order 
boundary value problem exists and is unique. The proof of this theorem can be found in 
[Keller, H]. 


Theorem 11.1 Suppose the function f in the boundary-value problem 
y= f(xy,y), fora <x <b, with ya) = a and y(b) = B, 
is continuous on the set 
D={(x,y,y) | fora <x <b, with —oo < y < cand —o0 < y' < ovo}, 


and that the partial derivatives f, and fy are also continuous on D. If 


(Gi) fy@.y.y’) > 0, for all (x, y, y’) € D, and 


(ii) aconstant M exists, with 
|fv@y.y)| <M, forall (x,y, y’) € D, 


then the boundary-value problem has a unique solution. a 


Example 1 Use Theorem 11.1 to show that the boundary-value problem 
y’+e”’+siny =0, for! <x <2, with y(1) = y2) =0, 


has a unique solution. 


Solution We have 
f@y,y) =—e” — siny’. 
and for all x in [1, 2], 
fy@yy)=xe >0 and |fy(x,y,y)| =| —cosy'| < 1. 


So the problem has a unique solution. a 
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A linear equation involves only 
linear powers of y and its 
derivatives. 


Corollary 11.2 


11.1. The Linear Shooting Method 673 


Linear Boundary-Value Problems 
The differential equation 
y= f@yy) 
is linear when functions p(x), g(x), and r(x) exist with 
fO.y.y') = p@)y’ + q@)y + 7Q). 


Problems of this type frequently occur, and in this situation, Theorem 11.1 can be simplified. 


Suppose the linear boundary-value problem 
y" = p@)y +qx)y+r(x), fora <x <b, with y(a) = a and y(b) = B, 
satisfies 
(i) p(x), g(x), and r(x) are continuous on [a, b], 
(ii) q(x) > 0 on [a, Bd]. 
Then the boundary-value problem has a unique solution. a 


To approximate the unique solution to this linear problem, we first consider the initial- 
value problems 


y" =p@)y +q@yt+rQ),with a<x<b, y(a)=a,and y(a)=0, (11.3) 
and 
y" = p(@x)y +q()y, with a<x<b, y(a)=0,and y(a)=1. (11.4) 


Theorem 5.17 in Section 5.9 (see page 329) ensures that under the hypotheses in 
Corollary 11.2, both problems have a unique solution. 

Let y;(x) denote the solution to (11.3), and let y2(x) denote the solution to (11.4). 
Assume that y2(b) € 0. (That y2(b) = 0 is in conflict with the hypotheses of Corollary 11.2 
is considered in Exercise 8.) Define 


B — yi(b) 
y2(b) 
Then y(x) is the solution to the linear boundary problem (11.3). To see this, first note that 


B—yilb) 
y2(b) 


yx) = yi) + y2(x). (11.5) 


y (x) = yy (x) + y5(x) 


and 


B = NO) nog 
yo(b) “7° 


Substituting for y/(x) and y5(x) in this equation gives 


B —yilb) 
y2(b) 


yx) = yi @) + 
y" = p(x)y, +9@)y1 + r(x) + (pOdys + a@)y2) 


B-yi(b) , 
pas) + q(x) (» + 


= p(x)y' (x) + q(@x)y@) +r). 


B—yilb) 


ne ) ve 


= p(x) (x + 
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Moreover, 
B — yi(d) B— yi(d) 
y(@) = 1. @ + —— yn (a) = a + ———— -0= 
y2(b) y2(b) 
and 
B — yi(d) 
y(b) = yi (b) + ——— yn (b) = yi (6) + B — yi) = B. 
y2(b) 
Linear Shooting 
This “shooting” hits the target The Shooting method for linear equations is based on the replacement of the linear boundary- 
after one trial shot. In the next value problem by the two initial-value problems (11.3) and (11.4). Numerous methods are 
section we see that nonlinear available from Chapter 5 for approximating the solutions y;(x) and y2(x), and once these 


problems require multiple shots. approximations are available, the solution to the boundary-value problem is approximated 
using Eq. (11.5). Graphically, the method has the appearance shown in Figure 11.1. 


Figure 11.1 


—y(b 
=e 


y,(b) Vox) 


Algorithm 11.1 uses the fourth-order Runge-Kutta technique to find the approximations 
to y1(x) and y2(x), but other techniques for approximating the solutions to initial-value 
problems can be substituted into Step 4. 

The algorithm has the additional feature of obtaining approximations for the derivative 
of the solution to the boundary-value problem as well as to the solution of the problem 
itself. The use of the algorithm is not restricted to those problems for which the hypotheses 
of Corollary 11.2 can be verified; it will work for many problems that do not satisfy these 
hypotheses. One such example can be found in Exercise 4. 


Linear Shooting 
To approximate the solution of the boundary-value problem 

—y" + py +q@)y+rax)=0, fora < x <b, with ya) =a and y(b) = B, 
(Note: Equations (11.3) and (11.4) are written as first-order systems and solved.) 


INPUT endpoints a, b; boundary conditions w, 8; number of subintervals N. 


OUTPUT approximations w; to y(x;); w2; to y'(x;) for each i = 0,1,...,N. 
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Step 1 Seth=(b—a)/N; 


ujo =a, 
2,9 = 0; 
vio = 0; 
U2,0 = 1. 


Step 2 Fori=0,...,N — 1 do Steps 3 and 4. 
(The Runge-Kutta method for systems is used in Steps 3 and 4.) 


Step 3 Setx=a+tih. 
Step 4 Setki = hin;; 
kiz = h[p@)u2i + q@)uns + r@)]; 
koa = h [ui + 5ki2]; 
koa = h[pGe + h/2) (uo; + 4k12) 
+q(x + h/2) (ui + thi) +r@ + h/2)]; 
ka, = h [uri + Sho]; 
k32 = h[p@ + h/2) (uz; + $22) 
+(x + h/2) Qj + 4ko1) + re + h/2)]; 

kay = h [uj + ka]; 
kag = h[p + h) (2; + k2) + e+ bh) (ui + kaa) + re + h)]; 
Mis. = Uri + & [Ara + kor + 2ko1 + kaa]; 
U2i41 = Uo + 7 [ki2 + 2ko2 + 2k32 + ka]; 
Ky = hv; 
ki, =h[p(a)v2; + g@)u,,]; 
ky =h[voit 3k} a); 
Ky = h[p(a + h/2) (voi + 4k)9) + gx + h/2) (v1; + 4k, 1)]; 
ky =h [v2 1 3K]; 

59 = h [p(x + h/2) (v2 + 52) + 9@ + h/2) (vii + 3%) ];5 

41 =h [voi + ky] 
- =h[port Muri +k) +9@ + Mos + 5 
ULid1 = Ui + F [ei + 2k5) + 2k, + £41]; 
Voit1 = Uri + & [Kyo + 2k> + 2K, + k4a]- 

Step 5 Set win =a; 
W2,.0 = a, 
ULN 
OUTPUT (a, w10, 2,9). 


Step 6 Fori=1,...,N 
set W1 = uj; + w2001 33 
W2 = un; + w2002,3 


x=a+tih; 
OUTPUT (x,W1,W2). (Output is x;, w1;, W2,j.) 
Step 7 STOP. (The process is complete.) rT] 
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CHAPTER 11 


Example 2 


Table 11.1 


Boundary-Value Problems for Ordinary Differential Equations 


Apply the Linear Shooting technique with N = 10 to the boundary-value problem 


2 sin(In x) 
x2 a x2? 


y) 
if Binoy & for 1 <x <2, with y(1) = 1 and y(2) =2, 


and compare the results to those of the exact solution 
3 1 
y=oxt+ S — 70 sin(In x) — To cos(In x), 


where 


1 
cx = 2518 — 12sin(n 2) — 4 cos(In2)] © —0.03920701320 


and 


11 
q= Ti Cy © 1.1392070132. 


Solution Applying Algorithm 11.1 to this problem requires approximating the solutions to 
the initial-value problems 


i Dogs. 2 sin(In x) : ; 
y= yrayt+ a for 1 < x < 2, with y;(1) = land y,(1) = 0, 
x x x 


2 2 
yy = ——y5+ s=y2, forl <x <2, with y(1) =0 and y5(1) = 1. 
X Xx 


The results of the calculations, using Algorithm 11.1 with N = 10 and h = 0.1, are 
given in Table 11.1. The value listed as u;; approximates y, (x;), the value v, ; approximates 
yo(x;), and w; approximates 


2— yi(2) 
yi) = yi @i) + ——y2 (3). a 
y2(2) 

Xj uy; © yi %) ULi © y2Qi) w; © y(xj) y@i) ly) — wil 
1.0 1.00000000 0.00000000 1.00000000 1.00000000 

1.1 1.00896058 0.09117986 1.09262917 1.09262930 1.43 x 1077 
1,2. 1.03245472 0.16851175 1.18708471 1.18708484 1.34 x 1077 
13 1.06674375 0.23608704 1.28338227 1.28338236 9.78 x 1078 
1.4 1.10928795 0.29659067 1.38144589 1.38144595 6.02 x 10-8 
1;5 1.15830000 0.35184379 1.48115939 1.48115942 3.06 x 10-8 
1.6 1.21248372 0.40311695 1.58239245 1.58239246 1.08 x 1078 
1.7 1.27087454 0.45131840 1.68501396 1.68501396 5.43 x 10719 
1.8 1.33273851 0.49711137 1.78889854 1.78889853 5.05 x 107° 
1.9 1.39750618 0.54098928 1.89392951 1.89392951 4.41 x 107° 
2.0 1.46472815 0.58332538 2.00000000 2.00000000 


The accurate results in this example are due to the fact that the fourth-order Runge- 
Kutta method gives O(h*) approximations to the solutions of the initial-value problems. 
Unfortunately, because of roundoff errors, there can be problems hidden in this technique. 
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Reducing Round-Off Error 


Round-off problems can occur if y;(x) rapidly increases as x goes from a to b. In this 
case uy.y © y,(b) will be large and if 6 is small in magnitude compared to u 1, the term 
w29 = (B — u.Nn)/v1y Will be approximately —u,,y/v1,7. The computations in Step 6 then 


become 
U1.N 
W1 =u, 4+ w20v1) © ui — (=) Vii 
VIN 
7 UN 
W2 = uj; + w29v2, © U2; — | —— } v2, 
VIN 


which allows a possibility of a loss of significant digits due to cancelation. However, because 
uy, iS an approximation to y;(x;), the behavior of y; can easily be monitored, and if wy; 
increases rapidly from a to b, the shooting technique can be employed backward from 
Xo = b to xy = a. This changes the initial-value problems that need to be solved to 


y” =p(x)y + q(x)y+r(x), fora <x < b, with y(b) = a and y'(b) = 0, 


and 


y” =p(x)y + q@)y, fora <x <b, with y(b) = 0 and y'(b) = 1. 

If this reverse shooting technique still gives cancellation of significant digits and if increased 
precision does not yield greater accuracy, other techniques must be used. Some of these are 
presented later in this chapter. In general, however, if u;,; and v;,; are O(h") approximations 
to y;(x;) and y2(%;), respectively, for each i = 0,1,...,N, then w,; will be an O(h") 
approximation to y(x;). In particular, 


VLi 


|wii — y(xj)| < Kh" |1+ 


> 


VIN 


for some constant K (see [IK], p. 426). 


EXERCISE SET 11.1 


1. The boundary-value problem 
y=4y—-x), O<x<l, yO)=0, yl) =2, 


has the solution y(x) = e?(e4 — 1)~!(e* — e~") +. x. Use the Linear Shooting method to approximate 
the solution, and compare the results to the actual solution. 
a. Withh= $5; b. Withh = §. 

2. The boundary-value problem 


y’ =y+2y+cosx, O<x<%, y0)=-03, y(%)=-0.1 


has the solution y(x) = — 4 (sinx + 3cosx). Use the Linear Shooting method to approximate the 
solution, and compare the results to the actual solution. 


a. Withh= 7; b.  Withh = §. 
3. Use the Linear Shooting method to approximate the solution to the following boundary-value 


problems. 
a yy” =—-3y +2y+2x+3, O<x<1, yO) =2, yd) =1; useh=0.1. 
by” = —4x7!y’— 2x7y 4 2x7Inx, 1<x <2, y(1)= —5, y(2) = In2; use h = 0.05. 
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ce y'=-(4t Dy +2y+(1—x%Je*, O<x<1, yO) =—1, yl) =0; useh =0.1. 
dy’ =x-ly +3x7y+x7!Inx—-1, 1<x <2, yd) =y(2) =0; useh = 0.1. 

4. Although g(x) < 0 in the following boundary-value problems, unique solutions exist and are given. 
Use the Linear Shooting Algorithm to approximate the solutions to the following problems, and 
compare the results to the actual solutions. 


a y'+y=0, O< x < ¥, yO) = 1, y(§) = 1; use h = F; actual solution y(x) = 
cosx + (/2 — 1)sinx. 

b. y’+4y =cosx, O<x < 7, yO) =0, y4) = 0; useh = 
—i cos 2x — we sin 2x + 7 cos x. 

cy = —4x7!y’ — 2x ?y 4+ 2x7 Inx, 1<x<2, y= i, y(2) = In2; use h = 0.05; actual 
solution y(x) = 4x7! — 2x-? + Inx — 3/2. 

d.  y” = 2y —y+xeC—x, O<-x < 2, y(0) = 0, y(2) = —4; use h = 0.2; actual solution 
y(x) = ix3e _ 2 xe" + 2e°—x—-2. 


39> actual solution yx) = 


5. Use the Linear Shooting Algorithm to approximate the solution y = e~!™ to the boundary-value 
problem 


y’=100y, O<x<1, yO)=1, yl)=e™. 


Use h = 0.1 and 0.05. 


6. Write the second-order initial-value problems (11.3) and (11.4) as first-order systems, and derive the 
equations necessary to solve the systems using the fourth-order Runge-Kutta method for systems. 


7. Let u represent the electrostatic potential between two concentric metal spheres of radii R, and Ry 
(R; < R)). The potential of the inner sphere is kept constant at V; volts, and the potential of the 
outer sphere is 0 volts. The potential in the region between the two spheres is governed by Laplace’s 
equation, which, in this particular application, reduces to 


=0, Risr<kR, uRi)=Vi, u(R2) =0. 


Suppose R; = 2 in., Ry = 4 in., and V; = 110 volts. 
a. Approximate u(3) using the Linear Shooting Algorithm. 
b. Compare the results of part (a) with the actual potential u(3), where 


VR R, =F: 
u(r) = : 
r R,—R, 


8. Show that, under the hypothesis of Corollary 11.2, if y, is the solution to y” = p(x)y’ + q(w)y and 
yo(a) = y2(b) = 0,7 then y2 = 0. 
9. Consider the boundary-value problem 


y+y=0, O<x<b, yO)=0, yb) =B. 


Find choices for b and B so that the boundary-value problem has 
a. No solution b. Exactly one solution c. Infinitely many solutions. 


10. Attempt to apply Exercise 9 to the boundary-value problem 
y’-y=0, O0<x<b, y0)=0, yb =B. 


What happens? How do both problems relate to Corollary 11.2? 


| Se 11.2 The Shooting Method for Nonlinear Problems 


The shooting technique for the nonlinear second-order boundary-value problem 


y’= f(x,y,y), fora <x <b, with y(a) =a and y(b) = B, (11.6) 
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is similar to the linear technique, except that the solution to a nonlinear problem cannot be 
expressed as a linear combination of the solutions to two initial-value problems. Instead, 
we approximate the solution to the boundary-value problem by using the solutions to a 
sequence Of initial-value problems involving a parameter t. These problems have the form 


y= f(x,y,y), fora <x <b, with y(a) =a and y'(a) =¢. (11.7) 
We do this by choosing the parameters f = f, in a manner to ensure that 


Jim y@, ti) = y(b) = B, 


where y(x, t,) denotes the solution to the initial-value problem (11.7) with t = t,, and y(x) 
denotes the solution to the boundary-value problem (11.6). 
Shooting methods for nonlinear This technique is called a “shooting” method by analogy to the procedure of firing 
problems require iterations to objects at a stationary target. (See Figure 11.2.) We start with a parameter fo that determines 
approach the “target”. the initial elevation at which the object is fired from the point (a,a) and along the curve 
described by the solution to the initial-value problem: 


y’ = f(x,y,y), fora <x <b, with ya) =a andy (a) = h. 


Figure 11.2 


(b, (b, to)) 


If y(b, to) is not sufficiently close to 6, we correct our approximation by choosing 
elevations 1, f2, and so on, until y(d, t,) is sufficiently close to “hitting” 6. (See Figure 11.3.) 

To determine the parameters t,, suppose a boundary-value problem of the form (11.6) 
satisfies the hypotheses of Theorem 11.1. If y(x, t) denotes the solution to the initial-value 
problem (11.7), we next determine ¢ with 


y(b,t) — B =0. (11.8) 


This is a nonlinear equation in the variable t. Problems of this type were considered in 
Chapter 2, and a number of methods are available. 

To use the Secant method to solve the problem, we need to choose initial approximations 
to and f,, and then generate the remaining terms of the sequence by 


(yD, tke-1) — B)(te-1 — te-2) 


k= 25 352.053 
yb, t-1) _ yb, tk-2) 
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yb, t) 
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VO, t3) yr, t3) 


7 
Wb, t1) y(x, ty) JPW t1) 


yb, to) 


/ 
vO, to) 


Newton Iteration 


To use the more powerful Newton’s method to generate the sequence {t,}, only one initial 
approximation, fo, is needed. However, the iteration has the form 


y(b, te-1) — B 
FD, tk-1) 
and it requires the knowledge of (dy/dt)(b, t,_1). This presents a difficulty because an 


explicit representation for y(b, t) is not known; we know only the values y(b, to), y(b, t1), 


to yb, t-1). 
Suppose we rewrite the initial-value problem (11.7), emphasizing that the solution 
depends on both x and the parameter f: 


th = th-1 — (11.9) 


y"(x,t) = f@, y(x,t),y¥ (x,t), fora <x <b, with y(a,t) = a and y(a,t) = t. 
(11.10) 
We have retained the prime notation to indicate differentiation with respect to x. We need 
to determine (dy/dt)(b,t) when t = tg_1, so we first take the partial derivative of (11.10) 
with respect to t. This implies that 


0 
— (x, t) = Love ty (x, t)) 


=i (eee ont) 
= 7g ODI EDT + FCI DY GO) G1 


af ate 
+ Fp OoME DYED) SOD. 


Since x and ¢ are independent,we have 0x/dt = 0 and the equation simplifies to 


ay” 0 0 0 ay’ 
(1) = To. yt ya) @,1) + A wyed.y¥ 9) we), (11) 
y ot dy ot 


/ 


fora < x < b. The initial conditions give 


/ 


3 3 
5 (a, t)=0 and (a, v=. 
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If we simplify the notation by using z(x, ft) to denote (dy/dt)(x, f) and assume that 
the order of differentiation of x and f can be reversed, (11.11) with the initial conditions 
becomes the initial-value problem 


a a 
“Gs Ge, y¥ )z(x, 1) + =O y,y)(x,t), fora<x <b, (11.12) 
7 y 


with z(a, t) = 0 and z(a,t) = 1. 
Newton’s method therefore requires that two initial-value problems, (11.10) and 
(11.12), be solved for each iteration. Then from Eq. (11.9), we have 


yb, te_-1) — B 
th = t_,) — —————_.. 11.13 
k= the Wb.ha) ( ) 


Of course, none of these initial-value problems is solved exactly; the solutions are approxi- 
mated by one of the methods discussed in Chapter 5. Algorithm 11.2 uses the Runge-Kutta 
method of order four to approximate both solutions required by Newton’s method. A similar 
procedure for the Secant method is considered in Exercise 5. 


Nonlinear Shooting with Newton’s Method 
To approximate the solution of the nonlinear boundary-value problem 
y= f(x,y,y), fora <x <b, with y(a) =a and y(b)= 6: 
(Note: Equations (11.10) and (11.12) are written as first-order systems and solved.) 


INPUT endpoints a,b; boundary conditions a, 8; number of subintervals N > 2; toler- 
ance TOL; maximum number of iterations M. 


OUTPUT approximations w) ; to y(x;); w2,; to y'(x;) foreachi = 0, 1,..., N oramessage 
that the maximum number of iterations was exceeded. 


Step 1 Seth=(b—a)/N; 
k=1; 
TK = (6B —a)/(b—a). (Note: TK could also be input.) 


Step 2. While (k < M) do Steps 3-10. 
Step 3 Setwio =a; 


w20 = TK; 
uy = 0; 
un = 1. 


Step 4 Fori=1,...,N do Steps 5 and 6. 
(The Runge-Kutta method for systems is used in Steps 5 and 6.) 


Step 5 Setx=a+(i-Ih. 
Step 6 Setk), = hw2,;-1; 
kyo =hf (x, wii-1 24-1); 
ko = h (wei-1 + $ki2); 
koa =hf (x +h/2, wii-1 + $k, W2i-1 + Ska); 
k31 = h (w2i-1 + $22); 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


682 CHAPTER 11 o@ 


Example 1 


Boundary-Value Problems for Ordinary Differential Equations 


kag =hf (x +h/2, wii + Fko1, W2i-1 + $ko2); 
kay = h(waji-1 + kaa); 
kang =hf (x +h, wii-1 + kai, w2i-1 + ka); 
Wii = Wii-t + (kia + 2ko1 + 2k31 + ka1)/6; 
Wai = W2i-1 + (ki,2 + 2ko,2 + 2k3,2 + ka2)/63 
Ky = hu; 
Ky = ALA, wii-1, W2i-1)M1 
+ fy &, Wirt, W2i-1) U2]; 
kh, =h [ua + Lb 
ky =h[ fe + h/2, writ, woi-1) (ur + $4),) 
+ fy (x + h/2, wi i-1, W241) (v2 + $k») ]; 
ki =h (ur + 5h); 
Kyo =h[ fy + h/2, wii-1, woi-1) (ui + 3k) 
+ fy (x + h/2, wi i-1, W2i-1) (v2 + $k) ]; 
ky) = hn + ky); 
kyo = h[ fp @ + A, wii-1, w2i-1) (ui + 1) 
+ fy (x +h, wii-1, W2i-1) (v2 + &,) |; 
uy = ay + Fly + 2k, + 2K, + als 
Uz = Uy + Zk} + 2k + 2k, + kyo). 
Step 7 If |wiy — B| < TOL then do Steps 8 and 9. 
Step 8 Fori=0,1,...,N 


setx =a-+ih; 
OUTPUT (x, w1;, w2,). 


Step 9 (The procedure is complete.) 
STOP. 
Step 10 Set TK = TK — Ea a 
uy} 
(Newton’s method is used to compute TK.) 
kK=k+1. 
Step 11 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. a 


The value f9 = TK selected in Step | is the slope of the straight line through (a, @) 
and (b, §). If the problem satisfies the hypotheses of Theorem 11.1, any choice of fg will 
give convergence, but a good choice of fo will improve convergence, and the procedure will 
even work for many problems that do not satisfy these hypotheses. One such example can 
be found in Exercise 3(d). 


Apply the Shooting method with Newton’s Method to the boundary-value problem 


" 1 3 / : 43 
y= gerne —yy), forl <x <3, with y(1) = 17 and y(3) = ae 
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Use N = 20, M = 10, and TOL = 10-°, and compare the results with the exact solution 
y(x) = x7 + 16/x. 


Solution We need approximate solutions to the initial-value problems 
1 
y= 3 32 +2x3— yy), for 1 <x <3, with y(1) = 17 and y'(1) =k, 


and 


of of 
NV _ / 
Zz yo ay< 


1 
=—g0'cty2), for] <x <3, with c(1) =O and z'(1) = 1, 


at each step in the iteration. If the stopping technique in Algorithm 11.2 requires 


|wiw (te) — y(3)| < 10~°, 


then we need four iterations and t, = —14.000203. The results obtained for this value of t 
are shown in Table 11.2. | 
mht2 wr, y(xi) wis — y(xi)| 
1.0 17.000000 17.000000 
1.1 15.755495 15.755455 4.06 x 1075 
1.2 14.773389 14.773333 5.60 x 10-> 
1.3 13.997752 13.997692 5.94 x 107> 
1.4 13.388629 13.388571 5.71 x 10>> 
1.5 12.916719 12.916667 5.23 x 10> 
1.6 12.560046 12.560000 4.64 x 1075 
1.7 12.301805 12.301765 4.02 x 107-5 
1.8 12.128923 12.128889 3.14 x 10> 
1.9 12.031081 12.031053 2.84 x 107> 
2.0 12.000023 12.000000 2.32 x 10-> 
2.1 12.029066 12.029048 1.84 x 107-5 
2.2 12.112741 12.112727 1.40 x 10-> 
23 12.246532 12.246522 1.01 x 10-5 
2.4 12.426673 12.426667 6.68 x 10~® 
2.5 12.650004 12.650000 3.61 x 10~° 
2.6 12.913847 12.913845 9.17 x 1077 
Det 13.215924 13.215926 1.43 x 10-° 
2.8 13.554282 13.554286 3.46 x 10-° 
2.9 13.927236 13.927241 5.21 x 10° 


3.0 14.333327 14.333333 6.69 x 10~° 


Although Newton’s method used with the shooting technique requires the solution of 
an additional initial-value problem, it will generally give faster convergence than the Secant 
method. However both methods are only locally convergent because they require good 
initial approximations. 

For a general discussion of the convergence of the shooting techniques for nonlinear 
problems, the reader is referred to the excellent book by Keller [Keller, H]. In that reference, 
more general boundary conditions are discussed. It is also noted that the shooting technique 
for nonlinear problems is sensitive to roundoff errors, especially if the solution y(x) and 
z(x, t) are rapidly increasing functions of x on [a, b]. 
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EXERCISE SET 11.2 


1. Use the Nonlinear Shooting Algorithm with h = 0.5 to approximate the solution to the boundary-value 
problem 


y=-QP-ytinz, 1<x<2, y()=0, yQ)=1n2. 


Compare your results to the actual solution y = Inx. 


2. Use the Nonlinear Shooting Algorithm with h = 0.25 to approximate the solution to the boundary- 
value problem 
" 3 1 I 
y=2y, -l<x<0, yw-lD=-, yO)=-. 
2 3 
Compare your results to the actual solution y(x) = 1/(« + 3). 


3. Use the Nonlinear Shooting method with TOL = 10~* to approximate the solution to the following 
boundary-value problems. The actual solution is given for comparison to your results. 
a y’=-e7, 1<x<2,y(1) =0, y(2) = In2; use N = 10; actual solution y(x) = Inx. 
b. y” = y'cosx—ylny, 0 < x < $,y@0) = ly (=) = e; use N = 10; actual solution 
y(x) = es, 
a y= (20") +y’y’) secx, Gx Z,y(%) = 2-4, y (4) = 5 /72:; use N = 5; actual 
solution y(x) = A/sin x. 
da y= 5 (1 — (yy —ysinx), 0 <x <2, y(O) = 2, y(z) = 2; use N = 20; actual solution 
y(x) = 2+ sinx. 
4. Use the Nonlinear Shooting method with TOL = 10~* to approximate the solution to the following 
boundary-value problems. The actual solution is given for comparison to your results. 
a y"=y—yy, 1<x <2,y(1) = 5, y(2) = 4; use A = 0.1; actual solution y(x) = (x + 171. 
by” = 2y3 — 6y— 2x7, 1 < x < 2, y) = 2, yQ) = 3; use h = 0.1; actual solution 
ya) =x4tx4. 
cy’ =y'4+2(y—Inxf—x!, 2<x <3,y(2)= ++ In 2, y(3) = + +1n3; use h = 0.1; actual 
solution y(x) = x7! + Inx. 
dd. y” = 20")?x3 — 9y’xF + 4x, 1 < x < 2, y(1) = 0, y(2) = In256; use h = 0.05; actual 
solution y(x) = x7 Inx. 
5. a. Change Algorithm 11.2 to incorporate the Secant method instead of Newton’s method. Use 
t) = (B — a)/(b— a) and ty = t + (B — y(B, to))/(b — a). 
b. Repeat Exercise 4(a) and 4(c) using the Secant algorithm derived in part (a), and compare the 
number of iterations required for the two methods. 


6. The Van der Pol equation, 


y’-uwG’—Dy' +y=0, p>O0, 


governs the flow of current in a vacuum tube with three internal elements. Let 4 = 
y(2) = 1. Approximate the solution y(t) for t = 0.27, where 1 <i < 9. 


, y(O) = 0, and 


a 
2 


| a 11.3 Finite-Difference Methods for Linear Problems 


The linear and nonlinear Shooting methods for boundary-value problems can present prob- 
lems of instability. The methods in this section have better stability characteristics, but they 
generally require more computation to obtain a specified accuracy. 

Methods involving finite differences for solving boundary-value problems replace each 
of the derivatives in the differential equation with an appropriate difference-quotient ap- 
proximation of the type considered in Section 4.1. The particular difference quotient and 
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step size h are chosen to maintain a specified order of truncation error. However, h cannot 
be chosen too small because of the general instability of the derivative approximations. 


Discrete Approximation 


The finite difference method for the linear second-order boundary-value problem, 
y" = p@y +q@)y+r), fora <x <b, with ya) =a and y(b)= 6, (11.14) 


requires that difference-quotient approximations be used to approximate both y’ and y”. First, 
we select aninteger N > O and divide the interval [a, b] into (V+1) equal subintervals whose 
endpoints are the mesh points x; = a+ih, fori = 0,1,...,N+1, where h = (b—a)/(N+1). 
Choosing the step size / in this manner facilitates the application of a matrix algorithm from 
Chapter 6, which solves a linear system involving an N x N matrix. 

At the interior mesh points, x;, fori = 1,2,...,N, the differential equation to be 
approximated is 


yi) = pay Gi) + g@iy@i) + rx). (11.15) 


Expanding y in a third Taylor polynomial about x; evaluated at x;;; and x;_1;, we have, 
assuming that y € C4Lxi 1, Xi41]5 


h2 3 ht 
yxig1) = ye +h) = yi) + hy’ (xi) + ay @) + eo") + ale (&/*), 


for some &;* in (x;, x:41), and 


3 


/ he ” h my ht (4)re- 
yQi-1) = yO — A) = yx) — hy’) + a (x3) — sy" i) + 7a? (§;), 


6 
for some &; in (x;_1,;). If these equations are added, we have 
2 ht 4 4 
yi) + YC) = 2G) + hy") + BOE) +yPE 
and solving for y’(x;) gives 
2 


h 
[Ly +) + YO). 


1 
y= py Git) — 2y(xi) + y@i-1)] 7A 


The Intermediate Value Theorem 1.11 can be used to simplify the error term to give 


" 1 he (4) 
y (xi) = pv itt) — 2y(xi) + y@i-1)] - 7D (i), (11.16) 


for some &; in (x;—1,X;41). This is called the centered-difference formula for y" (x;). 
A centered-difference formula for y’(x;) is obtained in a similar manner (the details 
were considered in Section 4.1), resulting in 


1 he 
y (x) = ap rw) — y(xi-1)] — oy Mm), (11.17) 


for some 7; in (4-1, X41). 
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The use of these centered-difference formulas in Eq. (11.15) results in the equation 


yt) — ae + y(xXj-1) = ptxp| a PE | eg uGs 
h2 
+ ri) = 75 Pp Gay") — yE)]. 


A Finite-Difference method with truncation error of order O(h7) results by using this 
equation together with the boundary conditions y(a) = a and y(b) = to define the system 
of linear equations 


Wo = a, Wn+1 = B 


and 


ee 2 pT Mir i — Wji- 
( Wi+l w ') + posy( MM) + q(xj)w; = —r(%), (11.18) 


for each i = 1,2,...,N. 
In the form we will consider, Eq. (11.18) is rewritten as 


h 2 h 2 
—({1+ Pi) wii + (2+ A’g(j)) wi — (1 - 5P&i) Wit) = —h’r(xi), 
and the resulting system of equations is expressed in the tridiagonal N x N matrix form 
Aw=b, where (11.19) 
h 

2+ h*q(x1) —1+ spe) 0) Ree enerecratame keane gneiss 0 

h ; h ern : 
—1— ZF pG2) 2+ hae) —1+ 5 pl) 


oer res 
= xP On-1) 


(eee ret ere ee nee ee .. 0 1 pw) "24 h?q(xy) 
5 h 
—h’r(x1) + {1+ =p) J wo 
WI 2 
wa —W r(x) 
w= , and b= : 
WN-1 —h?r(xy-1) 
WN 


5 h 
—h’r(xy) + (1 - 5P Gn) WN+I 


The following theorem gives conditions under which the tridiagonal linear system (11.19) 
has a unique solution. Its proof is a consequence of Theorem 6.31 on page 424 and is 
considered in Exercise 9. 


Theorem 11.3 Suppose that p,q, andr are continuous on [a, b]. If g(x) => 0on [a, b], then the tridiagonal lin- 
ear system (11.19) has a unique solution provided thath < 2/L, where L = maxg<y<y |p(x)|- 
| 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


11.3 Finite-Difference Methods for Linear Problems 687 


It should be noted that the hypotheses of Theorem 11.3 guarantee a unique solution to 
the boundary-value problem (11.14), but they do not guarantee that y € C*[a, b]. We need 
to establish that y is continuous on [a,b] to ensure that the truncation error has order 
O(h’). 

Algorithm 11.3 implements the Linear Finite-Difference method. 


Linear Finite-Difference 
To approximate the solution of the boundary-value problem 
y’ =p@y +q@y+r@), fora <x <b, with ya) =a and y(b) = B: 


INPUT endpoints a, b; boundary conditions a, 6; integer N > 2. 
OUTPUT approximations w; to y(x;) for eachi = 0,1,...,N 4+ 1. 


Step 1 Seth=(b—a)/(N+ 1); 
x=ath; 
a, = 2+ h’g(x); 
by = —1+ (h/2)p(x); 
dy = —h’r(x) + (1 + (h/2)p@))a. 


Step 2 Fori=2,...,N—1 
setx = a-+ih; 
a; = 2+ h?q(x); 
bj = —1 + (A/2)p(x); 
—1— (A/2)p(x); 
d; = —h’r(x). 
Step 3 Setx=b—h; 
ay = 2+ h(a); 
cy = —1— (h/2)p(x); 
dy = —h’r(x) + (1 — (h/2)p(x)) B. 


Step 4 Setl, =a,; (Steps 4-8 solve a tridiagonal linear system using Algorithm 6.7.) 


2 
Il 


uy = bi /ay; 
Z1 = d,/l,. 
Step 5 Fori=2,...,N —1setl; =a; — cjuj_1; 


uj = b;/li; 
a = (dj — ciz%-1)/li. 
Step 6 Set ly = ay — cyuy-1; 
zn = (dy — cyzw-1)/In. 
Step 7 Setwo =a; 
wii = B. 
Wn = ZN- 
Step 8 Fori=N-—1,...,1setw; = 7% — ujwist. 


Step 9 Fori=0,...,N+1setx =a+ih; 
OUTPUT (, wi). 


Step 10 STOP. (The procedure is complete.) rT] 
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CHAPTER 11 o 


Example 1 


Table 11.3 


Example 2 


Boundary-Value Problems for Ordinary Differential Equations 


Use Algorithm 11.3 with N = 9 to approximate the solution to the linear boundary-value 
problem 


F 2 ee sin(In x) . 
y= be aie la —— 3 for 1 <x < 2, with y(1) = 1 and y(2) = 2, 
x x x 


and compare the results to those obtained using the Shooting method in Example 2 of 
Section 11.1. 


Solution For this example, we will use N = 9, so h = 0.1, and we have the same spacing 
as in Example 2 of Section 11.1. The complete results are listed in Table 11.3. 


Xj Wi y@i) lw; — yOu) 
1.0 1.00000000 1.00000000 

1.1 1.09260052 1.09262930 2.88 x 10-5 
12 1.187043 13 1.18708484 4.17 x 107-5 
1.3 1.28333687 1.28338236 4.55 x 107 
1.4 1.38140205 1.38144595 4.39 x 10> 
1.5 1.48112026 1.48115942 3.92 x 10-> 
1.6 1.58235990 1.58239246 3.26 x 107° 
1.7 1.68498902 1.68501396 2.49 x 10-5 
1.8 1.78888175 1.78889853 1.68 x 10-5 
1.9 1.89392110 1.89392951 8.41 x 10~° 


2.0 2.00000000 2.00000000 


These results are considerably less accurate than those obtained in Example 2 of Section 
11.1. This is because the method used in that example involved a Runge-Kutta technique 
with local truncation error of order O(h*), whereas the difference method used here has 
local truncation error of order O(h7). | 


To obtain a difference method with greater accuracy, we can proceed in a number 
of ways. Using fifth-order Taylor series for approximating y’(x;) and y’(x;) results in a 
truncation error term involving h+. However, this process requires using multiples not only 
of y(%ji41) and y(xj_-1), but also of y(xj42) and y(xj;-2) in the approximation formulas for 
y’(x;) and y’(x;). This leads to difficulty at i = 0, because we do not know w_,, and 
at i = N, because we do not know wy4+2. Moreover, the resulting system of equations 
analogous to (11.19) is not in tridiagonal form, and the solution to the system requires 
many more calculations. 


Employing Richardson’s Extrapolation 


Instead of attempting to obtain a difference method with a higher-order truncation error in 
this manner, it is generally more satisfactory to consider a reduction in step size. In addition, 
Richardson’s extrapolation technique can be used effectively for this method because the 
error term is expressed in even powers of h with coefficients independent of h, provided y 
is sufficiently differentiable (see, for example, [Keller, H], p. 81). 


Apply Richardson’s extrapolation to approximate the solution to the boundary-value 
problem 


i 2 yp 2 sin(In x) . 
y= YP ger ae for 1 < x < 2, with y(1) = 1 and y(2) = 2, 
x x x 


using h = 0.1, 0.05, and 0.025. 
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Solution The results are listed in Table 11.4. The first extrapolation is 


4w;(h = 0.05) — wi(h = 0.1), 
3 


Ext); = 


the second extrapolation is 


4wi(h = 0.025) — wi(h = 0.05) - 


Ext); = 
. 3 


and the final extrapolation is 


16Exty; — Ext); 
fey aa een 


15 

Table 11.4 

x; w;(h = 0.05) w;(h = 0.025) Ext); Ext; Ext; 

1.0 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 
1.1 1.09262207 1.09262749 1.09262925 1.09262930 1.09262930 
1.2 1.18707436 1.18708222 1.18708477 1.18708484 1.18708484 
1.3 1.28337094 1.28337950 1.28338230 1.28338236 1.28338236 
1.4 1.38143493 1.38144319 1.38144589 1.38144595 1.38144595 
1.5 1.48114959 1.48115696 1.48115937 1.48115941 1.48115942 
1.6 1.58238429 1.58239042 1.58239242 1.58239246 1.58239246 
1.7 1.68500770 1.68501240 1.68501393 1.68501396 1.68501396 
1.8 1.78889432 1.78889748 1.78889852 1.78889853 1.78889853 
1.9 1.89392740 1.89392898 1.89392950 1.89392951 1.89392951 
2.0 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 


The values of w;( = 0.1) are omitted from the table to save space, but they are listed in 
Table 11.3. The results for w;(h = 0.025) are accurate to approximately 3 x 10~°. However, 
the results of Ext3; are correct to the decimal places listed. In fact, if sufficient digits had 
been used, this approximation would agree with the exact solution with maximum error of 
6.3 x 10~'! at the mesh points, an impressive improvement. a 


EXERCISE SET 11.3 


1. 


The boundary-value problem 
y’=4Q-x, O<x<1, yO)=0, y)=2 


has the solution yx) = e?(e* — 1)~!(e* — e~**) + x. Use the Linear Finite-Difference method to 
approximate the solution, and compare the results to the actual solution. 

a. Withh= 4; b. Withh = 4. 

c. Use extrapolation to approximate y(1/2). 


The boundary-value problem 
y’ =y+2y+cosx, O<x<%, y0)=-03, y(¥%)=-0.1 


has the solution y(x) = — ti (sin x+3 cos x). Use the Linear Finite-Difference method to approximate 
the solution, and compare the results to the actual solution. 
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a. Withh = 7; b. Withh ==. 


c. Use extrapolation to approximate y(z/4). 


3. Use the Linear Finite-Difference Algorithm to approximate the solution to the following boundary- 
value problems. 


ay’ = —3y'+2y+2x4+3, O<x<1,y(0) =2,y(1) = 1, useh=0.1. 
boy” = —4x7!y' + 2x?y — 2x7 Inx, = 1<x<2,y(I)= —$,y(2) = In2; useh = 0.05. 
ce y’=-(xe+)Dy +2yt+(1—-x%e*, O<x<1,y() =—1, y) = 0; use hh = 0.1. 
da. y=x7!y4+3x7ytx7Inx—-1, 1<x<2,y(1)=y2)=0;useh=0.1. 
4. Although g(x) < 0 in the following boundary-value problems, unique solutions exist and are given. 


Use the Linear Finite-Difference Algorithm to approximate the solutions, and compare the results to 
the actual solutions. 


a y”+y=0, O< x < §, yO) = 1, y({) = 1; use h = §; actual solution y(x) = cosx + 
(V2 - 1) sin x. 

b.  y"+4y = cosx, O0< x < 4, yO) = 0, y(4) = 0; use h = 
-} cos 2x — 2 sin 2x + : cos x. 

"= —Ax—ly! 4 2x-2y — 2x77 Inx, y(1) = 
y(x) = 4x7! — 2x7? 4 Inx — 3/2. 

dad. y’ = 2y —y+xe*—x, O< x < 2, y(0) = 0, (2) = —4; use h = 0.2; actual solution 
y(x) = ix3e* _ 2 xe* + 2e°—x—-—2. 


39; actual solution y(x) = 


y(2) = In2; use h = 0.05; actual solution 


fe) 
< 


1 
3 


5. Use the Linear Finite-Difference Algorithm to approximate the solution y = e~!°* to the boundary- 
value problem 


y’=100y, O<x<1, yO)=1, y(l)=e™. 


Use h = 0.1 and 0.05. Can you explain the consequences? 
6. Repeat Exercise 3(a) and (b) using the extrapolation discussed in Example 2. 


7. The lead example of this chapter concerned the deflection of a beam with supported ends subject to 
uniform loading. The boundary-value problem governing this physical situation is 


= wt x-l, O<x<l, 


with boundary conditions w(0) = 0 and w(/) = 0. 

Suppose the beam is a W10-type steel I-beam with the following characteristics: length / = 120 
in., intensity of uniform load g = 100 lb/ft, modulus of elasticity E = 3.0 x 107 lb/in.”, stress at ends 
S = 1000 lb, and central moment of inertia J = 625 in.*. 


a. Approximate the deflection w(x) of the beam every 6 in. 


b. The actual relationship is given by 
w(x) =caje" +ae"~+b(x—-D)x+c, 


where c; = 7.7042537 x 10*, co = 7.9207462 x 104, a = 2.3094010 x 10-*, b = —4.1666666 x 
10-3, and c = —1.5625 x 10°. Is the maximum error on the interval within 0.2 in.? 


c. State law requires that maxg_,-; w(x) < 1/300. Does this beam meet state code? 


8. The deflection of a uniformly loaded, long rectangular plate under an axial tension force is governed 
by a second-order differential equation. Let S represent the axial force and q the intensity of the 
uniform load. The deflection W along the elemental length is given by 


3 
W"@) — GWO) = sytt ape Osx<l WO=W =O, 


where / is the length of the plate and D is the flexual rigidity of the plate. Let g = 200 Ib/in.?, S = 100 
Ib/in., D = 8.8 x 107 Ib/in., and 7 = 50 in. Approximate the deflection at 1-in. intervals. 
9. Prove Theorem 11.3. [ Hint: To use Theorem 6.31, first show that | 4p(x;)| < 1 implies that 
—1— 5p@)| + |-1+ 5p@)| = 2] 
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10. Show that if y € C®[a, b] and if wo, w),..., Wy41 Satisfy Eq. (11.18), then 
w; — ya) = Al’ + O(n), 


where A is independent of h, provided g(x) => w > 0 on [a, b] for some w. 


aS 11.4 Finite-Difference Methods for Nonlinear Problems 


For the general nonlinear boundary-value problem 
y” = f(xy, y), fora <x <b, with ya) =a and y(b) = B, 


the difference method is similar to the method applied to linear problems in Section 11.3. 
Here, however, the system of equations will not be linear, so an iterative process is required 
to solve it. 

For the development of the procedure, we assume throughout that /f satisfies the fol- 
lowing conditions: 


e f and the partial derivatives f, and fy are all continuous on 


D={(,y,y) | a<x <b, with —co < y < cand —co < y' < oo}; 


e f,(%,y,y’) = 6 on D, for some 5 > 0; 


e Constants k and L exist, with 


k= max |f,@y,y)| and L= max |fy(x,y,y)|. 
(x.y,y ED (xy,y)eD 


This ensures, by Theorem 11.1, that a unique solution exists. 

As in the linear case, we divide [a, b] into (NV + 1) equal subintervals whose endpoints 
are at x; = a+ ih, fori = 0,1,...,N + 1. Assuming that the exact solution has a bounded 
fourth derivative allows us to replace y”(x;) and y’(x;) in each of the equations 


y" (xi) = f Gin Qi), yO) 


by the appropriate centered-difference formula given in Eqs. (11.16) and (11.17) 0 page 685. 
This gives, for each i = 1,2,...,N, 


y%in1) — 2y@i) + yOei-1) ys) — yr) F_, a 
_ h2 : = f (sista a 2h ; 6 y nd) + Me 


for some &; and n; in the interval (xj-1, x;+1). 
As in the linear case, the difference method results from deleting the error terms and 
employing the boundary conditions: 


and 


Wi41 — 2W; + Wi-1 Wit] — Wi-1 
7 + f | %,w;, —,— _} = 0, 


for eachi = 1,2,...,N. 
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The N x N nonlinear system obtained from this method, 


21 = we I F( man, AO) a = A 


2h 


(11.20) 


Wn — WNn-2 
—Wy-2 + 2wWy-1 — Wy + (sawn ut) = 0, 


— wy_ 
—wy_-1 + 2wy + F(a, WN; ome) —B=0 


has a unique solution provided that h < 2/L, as shown in [Keller, H], p. 86. 


Newton’s Method for Iterations 


We use Newton’s method for nonlinear systems, discussed in Section 10.2, to approximate 
the solution to this system. A sequence of iterates {(wy, wi, ee wW )} is generated 
that converges to the solution of system (11.20), provided that the initial approximation 
(w, we, ee yy is sufficiently close to the solution (w), w2,..., wy)‘, and that the 
Jacobian matrix for the system is nonsingular. For system (11.20), the Jacobian matrix 


J(w},..., Wy) is tridiagonal with ij-th entry 

Wi+1 — Wi-1 
2h 

Wi+l — Wi-1 


2h 


h 
=14 549 (sn, ); for i=j — 1 andj = 2,...,N, 


J(wy,...,Wn)ij = 2+ f (sw, ); for i=jandj=1,...,N, 


Wi+1 — Wi-1 


h 
1-5 (sum, SME), for i=j+1andj=1,...,.N—1, 


where wo = @ and wyi1 = BP. 
Newton’s method for nonlinear systems requires that at each iteration the N x N linear 


system 


Jw, bd ia » Wn) (U4, © aos jitnJ 


= (20 =_ w,-a+h f (xu ao) ; 


2h 


—w, +2u2.-—w3+h’f (v.75)... 


Wn — m2) 


—wy_2 + 2wy_1 — wy +h f (a1 WN-1; Th 


t 
— wy 
—wy-1 + 2wy +h’ f (x. UN, pao) = s) 
be solved for vj, v2,..., Uy, Since 
wi = go +v;, foreachi=1,2,...,N. 


Because J is tridiagonal this is not as formidable a problem as it might at first appear. In 
particular the Crout Factorization Algorithm 6.7 on page 424 can be applied. The process 
is detailed in Algorithm 11.4. 
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Nonlinear Finite-Difference 
To approximate the solution to the nonlinear boundary-value problem 
y= fy.y), fora <x <b, with ya) =a and y(b) = B: 


INPUT endpoints a, b; boundary conditions a, 8; integer N > 2; tolerance TOL; maxi- 
mum number of iterations VM. 


OUTPUT approximations w; to y(x;) for each i = 0,1,...,N + 1 or a message that the 
maximum number of iterations was exceeded. 


Step 1 Seth=(b—a)/(N+1); 
Wo = a, 
wn = B. 


Step 2 Fori=1,...Nsetu=ati(F—*) m 
—a 


Step 3 Setk=1. 
Step 4 While k < M do Steps 5-16. 


Step5 Setx=at+h; 
t = (w2 — a)/(2h); 
a, =2+9 fy(x, wi, 0); 
b) = —1+ (h/2) fy &, wi, 0); 
dy = —(2w, — w, —a +h’ f(x, wi,d). 


Step 6 Fori=2,...,N—1 
setx =a+ih; 
t= (wii — wi-1)/(2h); 
aj =2+M? f,(x, w;, 0); 
bj = —1+ (A/2) fy @, wit); 
ci = —1— (h/2) fy &, wi, 1); 
dj = —(2u; — Wi4l — Wi-1 + h? f (x, wi,t)). 
Step 7 Setx=b-—h; 
t= (B — wy-1)/(2A); 
ay =2+ We FG wy, t); 
cy = —1 — (h/2) fy, wy, 0); 
dy = —(2wy — wy-1 — B +h’ f(x, wy, 0). 


Step 8 Setl =a); (Steps 8-12 solve a tridiagonal linear system using 


Algorithm 6.7.) 
uy = bi /a; 
y= d/h. 
Step 9 Fori=2,...,N —1setl; =a; — cjuj_1; 
uj = bi /li; 


Zu = (dj — czj-1)/li. 
Step 10 Set ly = ay — cyuy-1; 
zy = (dy — cyZy-1)/ln- 
Step 11 Set vy = zw; 
Wy = Wn + UN. 
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Step 12 Fori=N —1,..., 1 set vu; = 2 — ujvj44; 
Wi = Ww; + V;. 


Step 13 If ||v|| < TOL then do Steps 14 and 15. 


Step 14 Fori=0,...,N+1setx=a+ih; 
OUTPUT (x, w,). 


Step 15 STOP. (The procedure was successful.) 
Step 16 Setk=k+1. 


Step 17 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. = 


It can be shown (see [IK], p. 433) that this Nonlinear Finite-Difference method is of 
order O(h?). 

A good initial approximation is required when the satisfaction of the conditions given 
at the beginning of this presentation cannot be verified, so an upper bound for the number of 
iterations should be specified and, if exceeded, a new initial approximation or a reduction in 
step size considered. Unless contradictory information is available it is reasonable to begin 
the procedure by assuming that the solution is linear. So the initial approximations w to 
w;, for each i = 1,2,...,N, are obtained in Step 2 by passing a straight line through the 
known endpoints (a, a) and (b, 8) and evaluating at x;. 


Example 1 Apply Algorithm 11.4, with # = 0.1, to the nonlinear boundary-value problem 
" 1 3 / ; 43 
y= gore —yy), forl <x <3, with y1) = 17 and y(3) = 3° 


and compare the results to those obtained in Example | of Section 11.2. 


Solution The stopping procedure used in Algorithm 11.4 was to iterate until values of 
successive iterates differed by less than 10~*. This was accomplished with four iterations. 
This gives the results in Table 11.5. They are less accurate than those obtained using the 
nonlinear shooting method, which gave results in the middle of the table accurate on the 
order of 107>. a 


Employing Richardson’s Extrapolation 


Richardson’s extrapolation procedure can also be used for the Nonlinear Finite-Difference 
method. Table 11.6 lists the results when this method is applied to our example using 
h = 0.1, 0.05, and 0.025, with four iterations in each case. The values of w;(h = 0.1) 
are omitted from the table to save space, but they are listed in Table 11.5. The values of 
w;(h = 0.25) are accurate to within about 1.5 x 10~+. However, the values of Ext; are all 
accurate to the places listed, with an actual maximum error of 3.68 x 107°, 
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Table 11.5 w; yx) [wi — yo) 

1.0 17.000000 17.000000 

ia 15.754503 15.755455 9.520 x 10-4 

12 14.771740 14.773333 1.594 x 10-3 

13 13.995677 13.997692 2.015 x 1073 

14 13.386297 13.388571 2.275 x 1073 

15 12.914252 12.916667 2.414 x 1073 

16 12.557538 12.560000 2.462 x 1073 

17 12.299326 12.301765 2.438 x 1073 

1.8 12.126529 12.128889 2.360 x 10-3 

1.9 12.028814 12.031053 2.239 x 1073 

2.0 11.997915 12.000000 2.085 x 1073 

2.1 12.027142 12.029048 1.905 x 10-3 

2.2 12.111020 12.112727 1.707 x 10-3 

2.3 12.245025 12.246522 1.497 x 10-3 

24 12.425388 12.426667 1.278 x 10-3 

2.5 12.648944 12.650000 1.056 x 10-3 

2.6 12.913013 12.913846 8.335 x 10-4 

2.7 13.215312 13.215926 6.142 x 10-4 

2.8 13.553885 13.554286 4.006 x 10-4 

2.9 13.927046 13.927241 1.953 x 10-4 

3.0 14.333333 14.333333 

Table 11.6. with=005) wh = 0.025) Exty; Exty; Ext; 

1.0 17.00000000 17.00000000 17.00000000 ——-17.00000000 —_17.00000000 
Ll 15.75521721 15.75539525 15.75545543 _—«15.75545460 _—«*15.75545455 
1.2 14.77293601 14.77323407 14.77333479 —«14.77333342—«(14.77333333 
13. 13.99718996 13.99756690 13.99769413 _13.99769242 _—13.99769231 
14 13.38800424 13.38842973 13.38857346 —«:13.38857156 _—‘:13.38857143 
15 12.91606471 12.91651628 12.91666881  12.91666680 _12.91666667 
1.6  12.55938618 12.55984665 1256000217 —-12.56000014 ~—12.56000000 
1.7 12.30115670 12.30161280 1230176684 —«12.30176484 ~—«12.30176471 
18  12.12830042 12.12874287 12.12899094 —-12.12888902 _—«12.12888889 
1.9  12.03049438 12.03091316 12.03105457 —-12.03105275 —12.03105263 
2.0 11.99948020 11.99987013 12.00000179 ——-12.00000011 ~—_—‘12.00000000 
21 12.02857252 12.02892892 12.02902924 -12.02904772 _—«12.02904762 
2.2  12.11230149 12.11262089 1211272872 —-12.11272736 «41211272727 
2.3 12.24614846 12.24642848 1224652299 —-12.24652182 —«12.24652174 
24 — 12.42634789 12.42658702 1242666773 —«:12.42666673 —«i12.42666667 
2.5 12.64973666 12.64993420 12.65000086 —-12.65000005 —_—‘12.65000000 
2.6 — 12.91362828 12.91379422 12.91384683 _—*12.91384620 _—«12.91384615 
2.7 13.21577275 13.21588765 13.21592641 -13.21592596 _—13.21592593 
28  13.55418579 13.55426075 13.55428603 —«*13.55428573.-—=«d13.55428571 
29  13,92719268 13.92722921 13.92724153 -13.92724139 _—«13.92724138 


14.33333333 


14.33333333 


14.33333333 


14.33333333 


14.33333333 
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EXERCISE SET 11.4 


1. Use the Nonlinear Finite-Difference method with h = 0.5 to approximate the solution to the boundary- 
value problem 


y=-(yyP-y+inx, 1<x<2, yQ)=0, y(2)=In2. 


Compare your results to the actual solution y = Inx. 


2. Use the Nonlinear Finite-Difference method with h = 0.25 to approximate the solution to the 
boundary-value problem 


3 


1 1 


Compare your results to the actual solution y(x) = 1/(x + 3). 


3. Use the Nonlinear Finite-Difference Algorithm with TOL = 10~* to approximate the solution to the 
following boundary-value problems. The actual solution is given for comparison to your results. 


a y’=-e, 1<x<2,y(1) =0, y(2) = 1n2; use N = 9; actual solution y(x) = Inx. 
b. y’=y'cosx—ylIny, O<x< 5,y0)=1Ly (2) = e;use N = 9; actual solution y(x) = es". 
ae y= (20) +y’y’) secx, Fx Z,y(4) = 2-4, y(¥) = 5 /12; use N = 4; actual 


solution y(x) = o/sin x. 
da y= 5 (1 -—(yy —ysinx), 0<x <2, y(0) = 2, y(7) = 2; use N = 19; actual solution 
y(x) = 2+ sinx. 
4. Use the Nonlinear Finite-Difference Algorithm with TOL = 10~* to approximate the solution to the 


following boundary-value problems. The actual solution is given for comparison to your results. 


a y’=y-yy, l<x<2, y= i, y(2) = teuseh = 0.1; actual solution yx) = («@+1)71. 


by’ = 2y — 6y - 23, I1<x <2, yd) = 2, yQ) = 3; use h = 0.1; actual solution 
yx) =x+x71, 

ce y=y+2y—Inxi—x', 2<x <3, y2)=441n2, yB) = $+ 1n3; useh = 0.1; 
actual solution y(x) = x7! + Inx. 

dd. y” = (y')?x3 — 9y*x 5 + 4x, 1 < x < 2, y(1) = 0, y(2) = 1n 256; use h = 0.05; actual 
solution y(x) = x inx. 


5. Repeat Exercise 4(a) and 4(b) using extrapolation. 


6. In Exercise 7 of Section 11.3, the deflection of a beam with supported ends subject to uniform loading 
was approximated. Using a more appropriate representation of curvature gives the differential equation 


[1+ Ww’)? 177 w" (x) = = wo) + set —l), forO<x <i. 


Approximate the deflection w(x) of the beam every 6 in., and compare the results to those of Exercise 7 
of Section 11.3. 


7. Show that the hypotheses listed at the beginning of the section ensure the nonsingularity of the Jacobian 
matrix J for h < 2/L. 


| wa 11.5 The Rayleigh-Ritz Method 


Jina Walliana Steal Ged The Shooting method for approximating the solution to a boundary-value problem replaced 
Rayleigh (1842-1919), a the boundary-value problem with pair of initial-value problems. The finite-difference ap- 
mathematical physicist who was proach replaces the continuous operation of differentiation with the discrete operation of 
particularly interested in wave finite differences. The Rayleigh-Ritz method is a variational technique that attacks the prob- 
propagation, received a Nobel lem from a third approach. The boundary-value problem is first reformulated as a problem 
Prize in physics in 1904. of choosing, from the set of all sufficiently differentiable functions satisfying the boundary 
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Walter Ritz (1878-1909), a 
theoretical physicist at Gottigen 
University, published a paper on 
a variational problem in 1909 
[Ri]. He died of tuberculosis at 
the age of 31. 


Theorem 11.4 
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conditions, the function to minimize a certain integral. Then the set of feasible functions is 
reduced in size, and an approximation is found from this set to minimize the integral. This 
gives our approximation to the solution of the boundary-value problem. 

To describe the Rayleigh-Ritz method, we consider approximating the solution to a 
linear two-point boundary-value problem from beam-stress analysis. This boundary-value 
problem is described by the differential equation 


d dy 


-2 (no 2) +qa)y=f@), for0<x <1, (11.21) 
dx dx 


with the boundary conditions 
y(0) = y(1) = 0. (11.22) 


This differential equation describes the deflection y(x) of a beam of length | with variable 
cross section represented by g(x). The deflection is due to the added stresses p(x) and f(x). 
More general boundary conditions are considered in Exercises 6 and 9. 

In the discussion that follows, we assume that p € C ‘TO, 1] and q, f € C[O, 1]. Further, 
we assume that there exists a constant 6 > 0 such that 


p(x) =>6, andthat q(x)>0, foreachx in [0,1]. 


These assumptions are sufficient to guarantee that the boundary-value problem given in 
(11.21) and (11.22) has a unique solution (see [BSW]). 


Variational Problems 


As is the case in many boundary-value problems that describe physical phenomena, the solu- 
tion to the beam equation satisfies an integral minimization variational property. The vari- 
ational principle for the beam equation is fundamental to the development of the Rayleigh- 
Ritz method and characterizes the solution to the beam equation as the function that mini- 
mizes an integral over all functions in C50, 1], the set of those functions u in C?[0, 1] with 
the property that u(0) = u(1) = 0. The following theorem gives the characterization. 


Let p € C'[0, 1], g, f € C[0, 1], and 
px)>d6>0, ga)>=O0, forO<x< 1. 


The function y € C50, 1] is the unique solution to the differential equation 
d d 
=— pa) +q@y=f@), forO<x<l, (11.23) 
dx dx 
if and only if y is the unique function in C50, 1] that minimizes the integral 
1 
Z[u] = / (palu’@P + q@lua@P — 2f @)u(x)} de. (11.24) 
0 
a 


Details of the proof of this theorem can be found in [Shul], pp. 88-89. It proceeds in 
three steps. First it is shown that any solution y to (11.23) also satisfies the equation 


: : dy du 
e i Ff (x)u(x)dx = i p(x) ik (x) — (x) + q(x) y(@x)u(x)dx, (11.25) 
0 0 Ix ° dx 


for all u € C2[0.1]. 
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e The second step shows that y € CoO, 1] is a solution to (11.24) if and only if (11.25) 
holds for all u € C50, 1}. 


e The final step shows that (11.25) has a unique solution. This unique solution will also be 
a solution to (11.24) and to (11.23), so the solutions to (11.23) and (11.24) are identical. 


The Rayleigh-Ritz method approximates the solution y by minimizing the integral, not 
over all the functions in C510, 1], but over a smaller set of functions consisting of linear 
combinations of certain basis functions ¢),@2,...,@,. The basis functions are linearly 
independent and satisfy 


¢)(0) = ¢(1) = 0, for each i = 1,2,...,n. 
An approximation ¢(x) = pa cid;(x) to the solution y(x) of Eq. (11.23) is then obtained 


by finding constants c,, c2,...,C, to minimize the integral / (ea cid]. 
From Eq. (11.24), 


I[] = | Dae (11.26) 
1 n D) n 2: n 
= i {pe| Dae] +400] Dao| = 209 eco} de 
i=1 


i=1 i=1 


and, for a minimum to occur, it is necessary, when considering J as a function of c1, c2,..., 
Cn, to have 


ar 


— =0, foreach j = 1,2,...,n. (11.27) 
OC; 


Differentiating (11.26) gives 


al 


oe 


1 n n 
[ {2009 S> ciPi(a)Gj(x) + 2G(x) > cii(2) Gj) — 20994 (0| dx, 
1 =1 
and substituting into Eq. (11.27) yields 
n 1 1 
0=>0 : (PEG) + g(x)bi044(8)} ax| c- / FO)G(x) de, (11.28) 
i=1 


for each j = 1,2,...,n. 
The normal equations described in Eq. (11.28) produce an n x n linear system Ac = b 
in the variables c,,C2,...,C,, where the symmetric matrix A has 


1 
aj = [ [P(x)h; (x) hj (x) + a) gi(x) hj (@)] dx, 


and b is defined by 


1 
b; = ‘ F (x)@i(x) dx. 
) 
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Figure 11.4 


y=) 
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Piecewise-Linear Basis 


The simplest choice of basis functions involves piecewise-linear polynomials. The first step 
is to form a partition of [0, 1] by choosing points x9,x),...,X,41 with 


O= x < Xt < 00) < Xp < Xp = 1. 


Letting h; = xi41 — x;, for each i = 0,1,...,n, we define the basis functions ¢;(x), 


$2 (x), sate »On(X) by 


0, if O<x<x-1, 
1 . 
i if x1 <x < Xj, 
di(x) = I (11.29) 
pt — x), if x <x < X41, 
0, if x41<x<1, 


for each i = 1,2,...,n. (See Figure 11.4.) 


The functions ¢; are piecewise-linear, so the derivatives ¢}, while not continuous, are 
constant on (x;,x;+1), for each j = 0,1,...,n, and 


0, if O<x <x;-4, 
1 : 
—, if x-1)<x <x, 
? hy-1 
d(x) = I (11.30) 
“io if xj<Xx < X41, 
0, if Xi41 <X < 1, 


for each i = 1,2,...,n. 
Because ¢; and ¢; are nonzero only on (xj-1, Xi41), 


di(X)Gj(x)=0 and = P(x) Gi(x) = 0, 
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except when j is i — 1, i, or i+ 1. As a consequence, the linear system given by (11.28) 
reduces to ann x n tridiagonal linear system. The nonzero entries in A are 


1 
ai = [ (POIP COP + g@)[eiO)P} dx 


La —1\? iti 
~ (;-) [ Pin) ae (=) [ D(x) dx 
aw 1\? prt 
+ ( ) i (x — xi-1)?g(x) dx + (5) / (i41 — x)?q(x) dx, 
hi-1 Xj-4 h; iE 


1 


for eachi = 1,2,...,n; 
1 
Git) = [ {P(X) Gj (0) bi, 10) + Gi (x) Gin 1 0)} dx 
( 


1\2 pxiti 1\2. pt 
= -(;) i p(x) dx + (;) / (Xi41 — X)(x — xj)g(x) dx, 


for eachi = 1,2,...,n — 1; and 


1 
aij-1 = [ (p(x); (~)b!_, 2) + g(x) dix) bi-100)} dx 
C 


1 2 Xj 1 2 Xj 
= -( ) / p(x) dx + ( ) i (x; — x)(x — xj-1) g(x) dx, 
hj pa hi-1 X= 


for each i = 2,...,n. The entries in b are 


! 1 
b= | forbid = 5 
0 


i-1 


Xj 1 Xj+1 
he eames al "Gash SRG de 
1 Ld Xj 


ie 


for eachi = 1,2,...,n. 
There are six types of integrals to be evaluated: 


1\2. prt 
i= (;) i (xi41 — xX) — x))q(x) dx, foreachi = 1,2,...,n—1, 
ie a ba 
Qo; = (; ) / (x — xj-1)°q(x) dx, for each i = 1,2,...,n, 
i-1 Xj-1 
1\2 pti is 
Q3; = (;) i (xi41 — x) q(x) dx, for eachi = 1,2,...,n, 
iy 
a= (; ) / p(x) dx, foreachi = 1,2,...,n+ 1, 
i-1 Xji-1 
1 # ; 
Qs; = j i (x —xj-1) f(x) dx, foreachi = 1,2,...,n, 
i-1l Jxj;_] 
and 
1 Xi+] 
06; = ~{ (xi41 —x) f(x) dx, foreachi=1,2,...,n. 
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The matrix A and the vector b in the linear system Ac = b have the entries 
aij = Qaj + O4i41 + Qo; + O3;, foreachi=1,2,...,n, 
Gijt1 = —Q4it41+Q1;, foreachi=1,2,...,n—1, 
dij-1 = —Q4it+ Qiji-1, foreachi = 2,3,...,n, 
and 
b; =Q5;+Q6;, foreachi=1,2,...,n. 


The entries in ¢ are the unknown coefficients ci, c2,...,C,, from which the Rayleigh-Ritz 
n 


approximation @¢, given by ¢(x) = > cid; (x), is constructed. 

To employ this method requires evaluating 6n integrals, which can be evaluated either 
directly or by a quadrature formula such as Composite Simpson’s rule. 

An alternative approach for the integral evaluation is to approximate each of the func- 
tions p, g, and f with its piecewise-linear interpolating polynomial and then integrate the 
approximation. Consider, for example, the integral Q; ;. The piecewise-linear interpolation 
of q is 


n+l 


Pax) = Do aagi(r), 


i=0 


where ¢1,...,, are defined in (11.30) and 


Xj -xX . X— Xp . 
, if O<x<x , if x,<x<1 
go)= 4 * and ayi(x) = 4 bm 
0, elsewhere 0, elsewhere. 


The interval of integration is [x;,x;,1], so the piecewise polynomial P, (x) reduces to 


Pg (x) = qx di(x) + G@i41) Gi41(). 


This is the first-degree interpolating polynomial studied in Section 3.1. By Theorem 3.3 on 
page 112, 


Iq(x) — Pg(x)| = O(h?), for x; < x < xi41, 


ifgeé C7 1x, Xj41].Fori = 1,2,...,n—1, the approximation to Q) ; is obtained by integrating 
the approximation to the integrand 


1 2 Xj+1 
a= (7) / CEE Cee eee 


hj 
2 prt . aa = : = *% 
~(-) [ (vin — 2006 1) | Oe 2 eave 2) as 


hj 
= 7p lai) + q(xi+1)]- 
Further, if g € C 2x; X41], then 
hy , 
Qi; - pla) + q(xi41)]] = O(F;). 
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Approximations to the other integrals are derived in a similar manner and are given by 


hy_ h; 

Qo; ~ 5 [3q(ai) + q(xi-1)], O31 ©7534) + aGiu)I, 
hj hy) 

O45 © 5 [p(xi) + p@i-1)], Os; © - (2f ai) + fQi-vI, 


and 
h; 
06, er) + f xi+i)]- 


Algorithm 11.5 sets up the tridiagonal linear system and incorporates the Crout Fac- 
torization Algorithm 6.7 to solve the system. The integrals Q1;,...,Q6, can be computed 
by one of the methods mentioned previously. 


Piecewise Linear Rayleigh-Ritz 


To approximate the solution to the boundary-value problem 


-=( oP) +a = £00 for 0 < x < 1, with yO) = 0 and y(1) = 0 


with the piecewise linear function 
$(x) = Do cidi(x) : 
i=l 


INPUT integer > 1; points x9 = 0 < x1 < +++ <x < X41 = 1. 


OUTPUT coefficients cy,...,C,. 


Step 1 Fori=0,...,n set h; = xj, — %j. 
Step 2 Fori=1,...,n define the piecewise linear basis ¢; by 


0, O<x<x-1, 

X — Xj-1 

eat Xj-1 <X << %, 
i-1 

gi(x) = 

Xi41 —X 

ie Xj <X < Xi41, 
i 

0, M41 <x< 1. 


Step 3  Foreachi = 1,2,...,n — 1 compute Q);, Q2;, 03, Q4,;, O51, Q6,i3 
Compute Q27, Q3n, Q4.n,Q4n4+1,Q5,n> Q6n- 

Step 4 Foreachi= 1,2,...,n—1, seta; = Q4; + Qai41 + Qo; + O35 
Bi = Qi; — O4i413 
bi = Os, + Q6,i- 

Step 5 Set dn = Qan + Qanzit + Q2n + Q3n; 

bn = Q5.n + Qon- 


Step6 Seta, =a; (Steps 6-10 solve a symmetric tridiagonal linear system using 
Algorithm 6.7.) 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


11.5 The Rayleigh-Ritz Method 703 


61 = Bi/ay; 
4 = bi /a,. 
Step 7 Fori=2,...,n—1 seta; =a; — Bi1G-13 
oi = Bi/ais 
z= (b) — Bi-1Zi-1)/ai- 
Step 8 Set Qn = An — Bn—16n-13 
gn = (hn = Bn—12Zn—1)/Gn- 


Step 9 Set cy = 23 


OUTPUT (c,). 
Step 10 Fori=n—1,...,1 setc; = 7% — Gci13 
OUTPUT (c;). 
Step 11. STOP. (The procedure is complete.) a 


The following uses Algorithm 11.5. Because of the elementary nature of this example, 
the integrals in Steps 3, 4, and 5 were found directly. 


Illustration Consider the boundary-value problem 
—y’+7°y =2n’sin(rx), for0 <x < 1, with y(0) = y(1) =0. 
Let h; = h = 0.1, so that x; = 0.1i, for each i = 0,1,...,9. The integrals are 


0.1i+0.1 a 
QO; = 100 [ (0.18 + 0.1 —x)(x — 0.11) 2? dx = —, 
0.1 60 


02; = 100 [ (x — 0.11 + 0.1)°x? dx = —, 
0.1i-0.1 30 


0.17+0.1 x? 
03,1 = 100 [ (0.17 + 0.1 —x)*x? dx = —, 
0.1i 30 


0.17 
Q4, = 100 / dx = 10, 
0.1i—0.1 


O.1i 
05; = 10 | (x — 0.11 + 0.1)27? sin wx dx 
0.11041 


= —27 cos0.1mi + 20[sin(0.17i) — sin((0.1i — 0.1)z)], 


and 


0.1i+0.1 
06; = 10 | (0.1i + 0.1 — x)27? sin wx dx 
0. 


Ai 
= 27 cos 0.1zi — 20[sin((0.1i + 0.1)z) — sin(0.177)]. 


The linear system Ac = b has 


at 
ee for each i = 1,2,...,9, 
st 
Giit1 = —10+ 60” for eachi = 1,2,...,8, 
<a 
ajj-1 = —10+ 60° for each i = 2,3,...,9, 
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and 


b; = 40 sin(O.17ri)[1 — cos 0.17], for eachi = 1,2,...,9. 


The solution to the tridiagonal linear system is 


co = 0.3102866742, cg = 0.5902003271, c7 = 0.8123410598, 
C6 = 0.9549641893, cs = 1.004108771, cy = 0.9549641893, 
cz = 0.8123410598, cz = 0.5902003271, c; = 0.3102866742. 


The piecewise-linear approximation is 


9 


(x) = > cidi(), 


i=1 


and the actual solution to the boundary-value problem is y(x) = sin zx. Table 11.7 lists the 
error in the approximation at x;, for eachi = 1,...,9. 


i Xj o (xi) y%) Ib (x) — yOu)! 
1 0.1 0.3102866742 0.3090169943 0.00127 
2, 0.2 0.5902003271 0.5877852522 0.00241 
3 0.3 0.8123410598 0.8090169943 0.00332 
4 0.4 0.9549641896 0.9510565162 0.00390 
5 0.5 1.0041087710 1.0000000000 0.00411 
6 0.6 0.9549641893 0.9510565 162 0.00390 
7 0.7 0.8123410598 0.8090169943 0.00332 
8 0.8 0.5902003271 0.5877852522 0.00241 
9 0.9 0.3102866742 0.3090169943 0.00127 


It can be shown that the tridiagonal matrix A given by the piecewise-linear basis func- 
tions is positive definite (see Exercise 12), so, by Theorem 6.26 on page 417, the linear 
system is stable with respect to roundoff error. Under the hypotheses presented at the be- 
ginning of this section, we have 


Ip (x) — y@)| = O17”), 
A proof of this result can be found in [Schul], pp. 103-104. 


for each x in [0, 1]. 


B-Spline Basis 


The use of piecewise-linear basis functions results in an approximate solution to Eqs. (11.22) 
and (11.23) that is continuous but not differentiable on [0, 1]. A more sophisticated set of 
basis functions is required to construct an approximation that belongs to C50, 1]. These 
basis functions are similar to the cubic interpolatory splines discussed in Section 3.5. 

Recall that the cubic interpolatory spline S on the five nodes Xo, x1, X2, x3, and x4 for a 
function f is defined by: 


(a) S(x) is a cubic polynomial, denoted S;(x), on the subinterval [x;, xj+1] for each 
j = 0, 1,2, 3; 
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B- (for “Basis’”) splines were 
introduced in 1946 by I. J. 
Schoenberg [Scho], but for more 
than a decade were difficult to 
compute. In 1972, Carl de Boor 
(1937-) [Deb1] described 
recursion formulae for evaluation 
which improved their stability 
and utility. 


Figure 11.5 
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(b)  S)(x;) = f (j) and S)(x:41) = f (41) for each j = 0, 1,2, 3; 
(©) S41 0:41) = S)(%j41) for each j = 0, 1,2; (Implied by (b).) 
(d) Si.) j41) = SiQ%41) for each j = 0, 1, 2; 

(e) Si Oj+1) = Si +1) for each j = 0, 1, 2; 


(f) One of the following sets of boundary conditions is satisfied: 


(@) SS” (xo) = 8S") =0 (natural (or free) boundary); 
(ii) =S’(xo) = f’(o) «and = S’(x%,) = f'n) (clamped boundary). 


Since uniqueness of solution requires the number of constants in (a), 16, to equal the 
number of conditions in (b) through (f), only one of the boundary conditions in (f) can be 
specified for the interpolatory cubic splines. 

The cubic spline functions we will use for our basis functions are called B-splines, or 
bell-shaped splines. These differ from interpolatory splines in that both sets of boundary 
conditions in (f) are satisfied. This requires the relaxation of two of the conditions in (b) 
through (e). Since the spline must have two continuous derivatives on [xo, x4], we delete 
two of the interpolation conditions from the description of the interpolatory splines. In 
particular, we modify condition (b) to 


b. Sj) = f(y) forj = 0, 2,4. 


For example, the basic B-spline S$ defined next and shown in Figure 11.5 uses the 
equally spaced nodes x» = —2, x1 = —1, x2 = 0, x3 = 1, and x4 = 2. It satisfies the 
interpolatory conditions 


b. S(x%) =0, S(x2)=1, S(x4) = 0; 


as well as both sets of conditions 


(i) S” (xo) = S”(x4) =O and (ii) S’(xp) = S’ (xg) = O. 


As a consequence, S € Cj (—00, oo), and is given specifically as 


0, if x <-—2, 
4(2+x)%, if —2<x<-l, 
y[2+x%-40 +x], if -—1<x<0, 


8) = 34 
@) 1fa—x»-4d-»3], if O<x<1, ee 
4(2—x)3, i Shere 2, 
0, if 2<x. 
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Figure 11.6 


Boundary-Value Problems for Ordinary Differential Equations 


We will now use this basic B-spline to construct the basis functions ¢; in C510, 1). 
We first partition [0, 1] by choosing a positive integer n and defining h = 1/(n + 1). This 
produces the equally-spaced nodes x; = ih, for each i = 0, 1,...,2-+ 1. We then define the 


: . 1 
basis functions {bitty as 


s(—") (a. if i=n, 
h h 


s(n") as (O"), if i=n+l. 


h h 


It is not difficult to show that {bi} is a linearly independent set of cubic splines satisfying 
(0) = $1) = 0, for each i = 0,1,...,,n + 1 (see Exercise 11). The graphs of ¢;, for 
2 <i <n-—1, are shown in Figure 11.6, and the graphs of ¢o, @1, @n, and @n+1 are in 
Figure 11.7. 


y = 6(x) wheni = 2,...,n—1 


Since ¢;(x) and ¢;(x) are nonzero only for x € [x;-2, Xj42], the matrix in the Rayleigh- 
Ritz approximation is a band matrix with bandwidth at most seven: 


ao0 aol ao2 a3 @) : SAGER PARES SS Hee e AEN 0 
410 a1 a2 a3 a4 
420 21 422, «23, 4 a5 
430, 431, 432, 433, 34, 435, 436, ee ; 
Be We ie a, a, ee, ee gD) 
a ar) *, .. ‘. ae 5 ., ore 
, nant 
SS ay gerd 


@) ey 0) An+1,n 2 Gn4+140 1 An+in Antin+l 
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Figure 11.7 


# =O 41Q) 


where 
1 
aj = [ {Px (XG; (x) + GO) bila) Gj ()} dx, 


for each i,j = 0,1,...,-+ 1. The vector b has the entries 


1 
b= i f (x)Gia)dx. 


The matrix A is positive definite (see Exercise 13), so the linear system Ac = b can be 
solved by Cholesky’s Algorithm 6.6 or by Gaussian elimination. Algorithm 11.6 details the 
construction of the cubic spline approximation ¢(x) by the Rayleigh-Ritz method for the 
boundary-value problem (11.21) and (11.22) given at the beginning of this section. 


Cubic Spline Rayleigh-Ritz 


To approximate the solution to the boundary-value problem 


ae (vo) + q(x)y = f(@), for0 <x < 1, with yO) = 0and y(1) = 0 
dx dx 


with the sum of cubic splines 


n+1 


o(x) = >> eibi(x) : 


i=0 
INPUT  integern > 1. 
OUTPUT coefficients co,..., Cn41. 
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Step 1 Seth=1/(n+ 1). 


Step 2 Fori=0,...,n+1 set x; = ih. 
Set x2 = x1 = 03 Xpyo = Xn43 = 1. 


Step 3 Define the function S by 
0, x <2, 


i(2+x), —22x¢ esi, 
t[2+x-4 +x], -1<x<0, 


S@) = 
i [2-x%-40—-x) 7], O<x<1, 
i(2—x), 1 2x2 2, 
0, 2<Xx 


Step 4 Define the cubic spline basis {bi}rtg by 


oo) = 5 (7) 45 (“E"), 


_ xX—X, x+h 
nas (8) -5(4), 


g(x) = 8 ¢ — 


= X—Xn\ | x—(n+2)h 
tio a5 (7B) 5 (TOBY, 


— Xp - 2)h 
Pnii(x) = S Cae) —4s as) . 


Step 5 Fori=0,...,n+ 1 do Steps 6-9. 
(Note: The integrals in Steps 6 and 9 can be evaluated using a numerical 
integration procedure.) 


). tor = 2.481 


Step 6 Forj=i,i+1,...,minfi+3,n+ 1} 
set L = max{x;_2, 0}; 
U = min{xi42, 1}; 
ay = i, [Pare}orgi(x) a ge) gilx)oj(x) | dx; 
ifi Aj, then set aj = aj. (Since A is symmetric.) 
Step 7 Ifi>4thenforj=0,...,i—4 seta; =0. 
Step 8 Ifi<n-—3thenforj=i+4,...,n+1 seta; =0. 
Step 9 Set L = max{x;_2, 0}; 
U = min{xi+2, 1}; 
bi = fr f@)bi(x) de. 
Step 10 Solve the linear system Ac = b, where A = (a;;),b = (bo, ..., Dn41)’ and 
CS (epdahy tan) 


Step 11 Fori=0,...,n+1 
OUTPUT (c;). 


Step 12 STOP. (The procedure is complete.) = 
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Illustration Consider the boundary-value problem 
—y"’+n°y =2n7 sin(zx), for0 <x < 1, with y(0) = y(1) =0. 


In the Illustration following Algorithm 11.5 we let h = 0.1 and generated approximations 
using piecewise-linear basis functions. Table 11.8 lists the results obtained by applying the 
B-splines as detailed in Algorithm 11.6 with this same choice of nodes. 


Table 11.8; c; x, (x) yx) lve) — bx)| 
0 0.50964361 x 1075 0 0.00000000 0.00000000 0.00000000 
1 0.20942608 0.1 0.30901644 0.30901699 0.00000055 
2 0.39835678 0.2 0.58778549 0.58778525 0.00000024 
3 0.54828946 0.3 0.80901687 0.80901699 0.00000012 
4 0.64455358 0.4 0.95105667 0.95105652 0.00000015 
5 0.67772340 0.5 1.00000002 1.00000000 0.00000020 
6 0.64455370 0.6 0.95105713 0.95105652 0.00000061 
yi 0.54828951 0.7 0.80901773 0.80901699 0.00000074 
8 0.39835730 0.8 0.58778690 0.58778525 0.00000165 
9 0.20942593 0.9 0.30901810 0.30901699 0.00000111 
10 0.74931285 x 1075 1.0 0.00000000 0.00000000 0.00000000 


We recommend that the integrations in Steps 6 and 9 be performed in two steps. First, 
construct cubic spline interpolatory polynomials for p, g, and f using the methods presented 
in Section 3.5. Then approximate the integrands by products of cubic splines or derivatives 
of cubic splines. The integrands are now piecewise polynomials and can be integrated 
exactly on each subinterval, and then summed. This leads to accurate approximations of 
the integrals. 

The hypotheses assumed at the beginning of this section are sufficient to guarantee that 


1 1/2 
{/ pon — on as| =O(h'), if O<x<1. 
0 


For a proof of this result, see [Schul], pp. 107-108. 

B-splines can also be defined for unequally-spaced nodes, but the details are more com- 
plicated. A presentation of the technique can be found in [Schul], p. 73. Another commonly 
used basis is the piecewise cubic Hermite polynomials. For an excellent presentation of this 
method, again see [Schul], pp. 24ff. 


Boris Grigorievich Galerkin Other methods that receive considerable attention are Galerkin, or “weak form,” meth- 
(1871-1945) did fundamental ods. For the boundary-value problem we have been considering, 

work applying approximation 

techniques to solve 2 (vo) +q(x)y= f(@), for0 <x <1, with yO) = 0and y(1) = 0, 
boundary-value problems dx dx 

associated with civil engineering 

problems. His initial paper on under the assumptions listed at the beginning of this section, the Galerkin and Rayleigh-Ritz 
finite-element analysis was methods are both determined by Eq. (11.29). However, this is not the case for an arbitrary 
published in 1915, and his boundary-value problem. A treatment of the similarities and differences in the two methods 


fundamental manuscript on thin and a discussion of the wide application of the Galerkin method can be found in [Schul] 
elastic plates in 1937. and in [SF]. 
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The word collocation has its root 
in the Latin “co-” and “locus” 
indicating together with and 
place. It is equivalent to what we 
call interpolation. 


Boundary-Value Problems for Ordinary Differential Equations 


Another popular technique for solving boundary-value problems is the method of 
collocation. 

This procedure begins by selecting a set of basis functions {¢1,..., dv}, a set of numbers 
{x;,.-.,Xn} in [0, 1], and requiring that an approximation 


N 


Yo cidi(x) 


i=1 


satisfy the differential equation at each of the numbers x;, for 1 < j < n. If, in addition, 
it is required that ¢;(0) = ¢;(1) = 0, for 1 < i < N, then the boundary conditions are 
automatically satisfied. Much attention in the literature has been given to the choice of the 
numbers {x;} and the basis functions {¢;}. One popular choice is to let the ¢; be the basis 
functions for spline functions relative to a partition of [0, 1], and to let the nodes {x;} be 
the Gaussian points or roots of certain orthogonal polynomials, transformed to the proper 
subinterval. 

A comparison of various collocation methods and finite difference methods is con- 
tained in [Ru]. The conclusion is that the collocation methods using higher-degree splines 
are competitive with finite-difference techniques using extrapolation. Other references for 
collocation methods are [DebS] and [LR]. 


EXERCISE SET 11.5 


1. 


Use the Piecewise Linear Algorithm to approximate the solution to the boundary-value problem 


2 2 


re ya costx, O<x<1, y(0)=y(1)=0 

—y = — cos —x, <x<l, = = 

y 4 y 16 Fi y y 

using X%) = 0,x, = 0.3,x. = 0.7,x3 = 1. Compare your results to the actual solution y(x) = 
4 cos 5x 2 sin 5x + ‘cos aX. 


Use the Piecewise Linear Algorithm to approximate the solution to the boundary-value problem 
d r 2 
= ayaa —8x4+1, O<x<1l, yO)=yd)=0 
Ix 


using x9 = 0,x1 = 0.4, x2 = 0.8,x3 = 1. Compare your results to the actual solution yx) = rx. 


Use the Piecewise Linear Algorithm to approximate the solutions to the following boundary-value 

problems, and compare the results to the actual solution: 

a. —x7y" — 2xy' + 2y = —4x?, 
y(x) = x? =x. 

b. -Lee'y) te'y=x4+(2-x)e*, O0<x< 1, yO) =y(1) = 0; use h = 0.1; actual solution 
y@) = @- D(e*- 1). 

c. —f(ery) +et*y=(x-1)—4+)1e"), OK<x <1, yO) = yA) =0; use h = 0.05; 
actual solution y(x) = x(e* — e). 

d.  —(@+ Dy’ -y+(@42)y = [2-4 1)*Jeln2 —-2e, O0<x <1, y(0) = y1) = 0; use 
h = 0.05; actual solution y(x) = e* In(x + 1) — (eln2)x. 


0 <x < 1, yO) = yd) = 0; use d = 0.1; actual solution 


Use the Cubic Spline Algorithm with n = 3 to approximate the solution to each of the following 
boundary-value problems, and compare the results to the actual solutions given in Exercises 1 and 2: 


a y+2y=Xcostx, 0<x<1, y(0)=0, yl) =0 


b. —£(xy’) +4y=4x?-8x+1, O0<x<1, yO) =0, v1) =0 
Repeat Exercise 3 using the Cubic Spline Algorithm. 
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6. Show that the boundary-value problem 


—* (peo) +qa)y=f@), Osx<l, yO=a, y)=8, 
can be transformed by the change of variable 
z=y—px-—(U—-x)a 
into the form 
-£ pws) +qa)z=F(x), O<x<1l, 20)=0, z20)=0. 


7. Use Exercise 6 and the Piecewise Linear Algorithm with n = 9 to approximate the solution to the 
boundary-value problem 


-y’+y=x, 0<x<1, yO=1, y)=lt+e". 
8. Repeat Exercise 7 using the Cubic Spline Algorithm. 


9. Show that the boundary-value problem 


d 
— 7G, PO) +qx)y= f(x), a<x<b, ya=a, yb)=6, 
can be transformed into the form 
d 
~ Fp Pw) +qw)z=FWw), O<w<l, 720)=0, z2)=0, 


by a method similar to that given in Exercise 6. 
10. Show that the piecewise-linear basis functions {@;}7_, are linearly independent. 
11. Show that the cubic spline basis functions {oi} are linearly independent. 
12. Show that the matrix given by the piecewise linear basis functions is positive definite. 


13. Show that the matrix given by the cubic spline basis functions is positive definite. 


| SS 11.6 Survey of Methods and Software 


In this chapter we discussed methods for approximating solutions to boundary-value prob- 
lems. For the linear boundary-value problem 


y =p@)y +q@)y+rQ@), a<x<b, ya=a, yb=8, 


we considered both a linear shooting method and a finite-difference method to approximate 
the solution. The shooting method uses an initial-value technique to solve the problems 


y" =p@y +q@ytr@), fora <x <b, with y@) =a and y'(a) = 0, 
and 
y" = p(x)y +q()y, fora <x < b, with y(a) = Oand y'(a) = 1. 


A weighted average of these solutions produces a solution to the linear boundary-value 
problem, although in certain situations there are problems with round-off error. 

In the finite-difference method, we replaced y” and y’ with difference approximations 
and solved a linear system. Although the approximations may not be as accurate as the 
shooting method, there is less sensitivity to roundoff error. Higher-order difference methods 
are available, or extrapolation can be used to improve accuracy. 
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Boundary-Value Problems for Ordinary Differential Equations 


For the nonlinear boundary problem 
y’=fy.y), fora <x <b, with y@) = a and y(b) = 8, 


we also considered two methods. The nonlinear shooting method requires the solution of 
the initial-value problem 


y’ =f (x,y,y), fora <x <b, with ya) =a andy (a) =t, 


for an initial choice of t. We improved the choice of t by using Newton’s method to approx- 
imate the solution to y(b, t) = 6. This method required solving two initial-value problems 
at each iteration. The accuracy is dependent on the choice of method for solving the initial- 
value problems. 

The finite-difference method for the nonlinear equation requires the replacement of y” 
and y’ by difference quotients, which results in a nonlinear system. This system is solved 
using Newton’s method. Higher-order differences or extrapolation can be used to improve 
accuracy. Finite-difference methods tend to be less sensitive to roundoff error than shooting 
methods. 

The Rayleigh-Ritz-Galerkin method was illustrated by approximating the solution to 
the boundary-value problem 


d d 
a (ro ) +qnoy=f@), O0<x<1, yO) =yl) =0. 
x. dx 


A piecewise-linear approximation or a cubic spline approximation can be obtained. 
Most of the material concerning second-order boundary-value problems can be ex- 
tended to problems with boundary conditions of the form 


ajy(a) + Biy'(a)=a@ and apy(b) + poy'(b) = B, 


where |a;| + |6:| ~ O and |a2| + |62| 4 0, but some of the techniques become quite 
complicated. The reader who is interested in problems of this type is advised to consider a 
book specializing in boundary-value problems, such as [Keller, H]. 

The IMSL library has many subroutines for boundary-value problems. There are both 
shooting and finite difference methods. The shooting methods use the Runge-Kutta- Verner 
technique for solving the associated initial-value problems. 

The NAG Library also has a multitude of subroutines for solving boundary-value 
problems. Some of these are a shooting method using the Runge-Kutta-Merson initial- 
value method in conjunction with Newton’s method, a finite-difference method with 
Newton’s method to solve the nonlinear system, and a linear finite-difference method based 
on collocation. 

There are subroutines in the ODE package contained in the Netlib library for solving 
both linear and nonlinear two-point boundary-value problems, respectively. These routines 
are based on multiple shooting methods. 

Further information on the general problems involved with the numerical solution to 
two-point boundary-value problems can be found in Keller [Keller, H] and Bailey, Shampine 
and Waltman [BSW]. Roberts and Shipman [RS] focuses on the shooting methods for the 
two-point boundary-value problem, and Pryce [Pr] restricts attention to Sturm-Liouville 
problems. The book by Ascher, Mattheij, and Russell [AMR] has a comprehensive presen- 
tation of multiple shooting and parallel shooting methods. 
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Numerical Solutions to Partial Differential 
Equations 


Introduction 


A body is isotropic if the thermal conductivity at each point in the body is independent 
of the direction of heat flow through the point. Suppose that k, c, and p are functions of 
(x, y, Z) and represent, respectively, the thermal conductivity, specific heat, and density of 
an isotropic the body at the point (x, y, z). Then the temperature, u = u(x, y, z, t), ina body 
can be found by solving the partial differential equation 


te) ou r) ou r) ou ou 
k + k + k =cp—, 
ax \ ax dy \ dy dz \ Oz ot 


When k, c, and ¢ are constants, this equation is known as the simple three-dimensional heat 
equation and is expressed as 


au du du cp du 
af ee Ta : 
ax dy 0z k ot 


If the boundary of the body is relatively simple, the solution to this equation can be found 
using Fourier series. 

In most situations where k, c, and p are not constant or when the boundary is irreg- 
ular, the solution to the partial differential equation must be obtained by approximation 
techniques. An introduction to techniques of this type is presented in this chapter. 


Elliptic Equations 


Common partial differential equations are categorized in a manner similar to the conic sec- 
tions. The partial differential equation we will consider in Section 12.1 involves u,.(x, y) + 
Uyy(x, y) and is an elliptic equation. The particular elliptic equation we will consider is 
known as the Poisson equation: 


a7u au 
a2») 7 ay = f(x,y). 


In this equation we assume that f describes the input to the problem on a plane region R with 
boundary S. Equations of this type arise in the study of various time-independent physical 
problems such as the steady-state distribution of heat in a plane region, the potential energy 
of a point in a plane acted on by gravitational forces in the plane, and two-dimensional 
steady-state problems involving incompressible fluids. 


713 


ngage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
leemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


714 CHAPTER 12 o 


Siméon-Denis Poisson 
(1781-1840) was a student of 
Laplace and Legendre during the 
Napoleonic years in France. 
Later he assumed Fourier’s 
professorship at the Ecole 
Polytechnique where he worked 
on ordinary and partial 
differential equations, and later in 
life on probability theory. 


Figure 12.1 


Pierre-Simon Laplace 
(1749-1827) worked in many 
mathematical areas, producing 
seminal papers in probability and 
mathematical physics. He 
published his major work on the 
theory of heat during the period 
1817-1820. 


Johann Peter Gustav Lejeune 
Dirichlet (1805-1859) made 
major contributions to the areas 
of number theory and the 
convergence of series. In fact, he 
could be considered the founder 
of Fourier series, since according 
to Riemann he was the first to 
write a profound paper on this 
subject. 


Figure 12.2 


Numerical Solutions to Partial Differential Equations 


Additional constraints must be imposed to obtain a unique solution to the Poisson 
equation. For example, the study of the steady-state distribution of heat in a plane region 
requires that f(x, y) = 0, resulting in a simplification to Laplace’s equation 


a7u a7u 
a) oF aye = 0. 


If the temperature within the region is determined by the temperature distribution on 
the boundary of the region, the constraints are called the Dirichlet boundary conditions, 
given by 


u(x, y) = g(x,y), 


for all (x, y) on S, the boundary of the region R. (See Figure 12.1.) 


(x, v): Temperature is 
held constant 


at g(x, v) degrees 


> 
x 


Parabolic Equations 


In Section 12.2 we consider the numerical solution to a problem involving a parabolic 
partial differential equation of the form 


du 5 0°u 
ay ot —a a oD) = 0. 


The physical problem considered here concerns the flow of heat along a rod of length / (see 
Figure 12.2) which has a uniform temperature within each cross-sectional element. This 
requires the rod to be perfectly insulated on its lateral surface. The constant a is assumed to 
be independent of the position in the rod. It is determined by the heat-conductive properties 
of the material of which the rod is composed. 


One of the typical sets of constraints for a heat-flow problem of this type is to specify 
the initial heat distribution in the rod, 


u(x,0) = f(), 
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and to describe the behavior at the ends of the rod. For example, if the ends are held at 
constant temperatures U; and U2, the boundary conditions have the form 

u(0,t) =U, and u(l,t) = U2, 
and the heat distribution approaches the limiting temperature distribution 


Uz, — U; 
——— 


lim u(x,t) = U, + 
t>0o I 


If, instead, the rod is insulated so that no heat flows through the ends, the boundary conditions 
are 


ou ou 
—(0,t)=0 and —<(l,t) =0. 
Ox Ox 


Then no heat escapes from the rod and in the limiting case the temperature on the rod is 
constant. The parabolic partial differential equation is also of importance in the study of 
gas diffusion; in fact, it is known in some circles as the diffusion equation. 


Hyperbolic Equations 


The problem studied in Section 12.3 is the one-dimensional wave equation and is an 
example of a hyperbolic partial differential equation. Suppose an elastic string of length / 
is stretched between two supports at the same horizontal level (see Figure 12.3). 


Figure 12.3 


x, fixed time t 


If the string is set to vibrate in a vertical plane, the vertical displacement u(x, t) of a 
point x at time ¢ satisfies the partial differential equation 


5 07u a7u 
5 ol) ag et) = 0, forO<x<l/ and O<t, 


provided that damping effects are neglected and the amplitude is not too large. To impose 
constraints on this problem, assume that the initial position and velocity of the string are 
given by 


u(x,0) = f(x) and te 0) = g(x), for 0O<x<l. 


If the endpoints are fixed, we also have u(0, t) = 0 and u(/, t) = 0. 

Other physical problems involving the hyperbolic partial differential equation occur 
in the study of vibrating beams with one or both ends clamped and in the transmission of 
electricity on a long line where there is some leakage of current to the ground. 
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12.1 Elliptic Partial Differential Equations 


The elliptic partial differential equation we consider is the Poisson equation, 


v2 _ au au _ D1 
u(x, y) = a2 OY) + GY) = FAY) ( * ) 


on R = {(x,y) |a<x< b,c < y < d}, withu(@,y) = g(x,y) for , y) € S, where 
S denotes the boundary of R. If f and g are continuous on their domains, then there is a 
unique solution to this equation. 


Selecting a Grid 


The method used is a two-dimensional adaptation of the Finite-Difference method for linear 
boundary-value problems, which was discussed in Section 11.3. The first step is to choose 
integers n and m to define step sizes h = (b — a)/n and k = (d — c)/m. Partition the 
interval [a, b] into n equal parts of width h and the interval [c, d] into m equal parts of width 
k (see Figure 12.4). 


Figure 12.4 


Place a grid on the rectangle R by drawing vertical and horizontal lines through the 
points with coordinates (x;, y;), where 


xj =a+t+ih, foreachi=0,1,...,n, and y;=c+ jk, foreachj =0,1,...,m. 


The lines x = x; and y = y; are grid lines, and their intersections are the mesh points of 
the grid. For each mesh point in the interior of the grid, (x;, y;), fori = 1,2,...,n — 1 and 
j = 1,2,...,m— 1, we can use the Taylor series in the variable x about x; to generate the 
centered-difference formula 


BU ies U(Xi41,j) — 2u(xj, yj) + uGi-1,yj) W d4u 
ae = ie 12 ax4 
where & € (xj-1,%j+1). We can also use the Taylor series in the variable y about y; to 


generate the centered-difference formula 


a7u 
By HM) = 


(i. 9)» (12.2) 


U(X, Yi41) — 2u(x;,¥j) + UC, yj-1) Kk? O4u 
R 12 ay 


(ii. nj). (12.3) 


where nj € (Vj-1, Yj+1)- 
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Figure 12.5 
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Using these formulas in Eq. (12.1) allows us to express the Poisson equation at the 
points (x;, yj) as 


U(Xi41,¥j) — 2u(%j, yj) + Ui-1, yj) n U(X, Yin) — 2u(%i, yj) + UG, Yj-1) 


he ke 
h? dtu k? atu 
= f(y) + To pa Ge) F Day Nj)> 


for each i = 1,2,...,n — 1 andj = 1,2,...,m— 1. The boundary conditions are 
u(Xo, yj) = (Xo, yj) and urn, yj) = Gn, yj), foreach j = 0,1,...,m; 
U(X, Yo) = 8%, Yo) and UX, ¥m) = 8Qi.¥m), foreach i= 1,2,...,n—1. 
Finite-Difference Method 


In difference-equation form, this results in the Finite-Difference method: 


h\’ h\? 
21(;) + 1] — (wi41y + Wi-1y) — (Z) (wigs + wij) = —h’ fi), (12.4) 


for each i = 1,2,...,n— 1 andj = 1,2,...,m-— 1, and 


wo = g(%o, yj) and wy = g(%, yj), foreachj = 0,1,...,m; (12.5) 
Wio = 8%, Yo) and Wim = 8Qi,¥n), foreachi=1,2,...,n—1; 


where w, approximates u(x;, y;). This method has local truncation error of order O(h? +k?) 
The typical equation in (12.4) involves approximations to u(x, y) at the points 


(X19), Oinyi), CitryA), Gi,yj-1), and (, yj+41). 


Reproducing the portion of the grid where these points are located (see Figure 12.5) 
shows that each equation involves approximations in a star-shaped region about the blue X 
at (Xj, yj). 


We use the information from the boundary conditions (12.5) whenever appropriate in 
the system given by (12.4); that is, at all points (x;, y;) adjacent to a boundary mesh point. 
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CHAPTER 12 


Figure 12.6 


Example 1 


Numerical Solutions to Partial Differential Equations 


This produces an (n — 1)(m — 1) x (n— 1)(m — 1) linear system with the unknowns being 
the approximations w;,; to u(x;, y;) at the interior mesh points. 

The linear system involving these unknowns is expressed for matrix calculations more 
efficiently if a relabeling of the interior mesh points is introduced. A recommended labeling 
of these points (see [Varl], p. 210) is to let 


P,= (xj, 9) and U] = Wij, 


where / = i+ (m— 1 —j)(n — 1), for eachi = 1,2,...,n— 1 andj = 1,2,...,m— 1. 
This labels the mesh points consecutively from left to right and top to bottom. Labeling 
the points in this manner ensures that the system needed to determine the w;; is a banded 
matrix with band width at most 2n — 1. 


For example, with n = 4 and m = 5, the relabeling results in a grid whose points are 
shown in Figure 12.6. 


Determine the steady-state heat distribution in a thin square metal plate with dimensions 
0.5 m by 0.5 m using n = m = 4. Two adjacent boundaries are held at 0°C, and the 
heat on the other boundaries increases linearly from 0°C at one corner to 100°C where the 
sides meet. 


Solution Place the sides with the zero boundary conditions along the x- and y-axes. Then 
the problem is expressed as 

a7u a7u 

a) a aye = 0, 


for (x, y) in the set R = { (x,y) | 0 <x < 0.5, 0 < y < 0.5}. The boundary conditions are 
u(O,y) = 0, uix,0) =0, u(x, 0.5) = 200x, and u(0.5, y) = 200y. 


If n = m = 4, the problem has the grid given in Figure 12.7, and the difference equation 
(12.4) is 


4wij — Wiig — Wi-1jy — Wij-1 — Wij+1 = 9, 


for each i = 1,2,3 andj = 1,2,3. 
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Figure 12.7 


Table 12.1 


Wi 


~. 


18.75 
37.50 
56.25 
12.50 
25.00 
37.50 
6.25 
12.50 
18.75 


oe 
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u(x, 0.5) = 200x 


u(0, y) =0 = u(0.5, y) = 200y 


Expressing this in terms of the relabeled interior grid points w; = u(P;) implies that 
the equations at the points P; are: 


Py: 4w, — W2 — W4 = Wo3 + W14, 
Po: 4w2 — w3 — Wi — W5 = W2, 

P3: 4w3 — wW2 — We = W43 + W344, 
Pq: 4w4 — Ws — W, — W7 = Wo, 

Ps i 4ws W6 W4 Ww2 Ws = 0, 

Pe: 4we — Ws — W3 — Wo = War, 

Py: 4w7 — Wg — W4 = Wo, + W109, 
Pg: 4wg — Wo — W7 — Ws = W299, 

Po: 4wo — Wg — We = W3.9 + Wa, 


where the right sides of the equations are obtained from the boundary conditions. 
In fact, the boundary conditions imply that 


W1,0 = W29 = W390 = Wo, = Wo2 = Wo3 = 9, 
W144 = Wa = 25, W244 = W442 = 50, and W344 = W43 = 715. 


So the linear system associated with this problem has the form 


4 -l 0 -l 0 0 O O 0 W 25 
-1 4 -1 0 -l 0 O O 0 1) 50 
0 -l 4 0 0-1 0 oO O W3 150 
—-l 0 oO 4 -I 0 -l 0 0 w4 0 
0 -l 0 -l 4 -1 0 -l 0 ws | = 0 
0 oO -l 0 -l 4 0 0-1 W6 50 
0 oO oO -!Il 0 oO 4 -I 0 w7 0 
0 oO oO O -1 0 -l 4 -1 Ws 0 
0 0 O O O -1 0 -1 4 W9 25 


The values of w1, w2,..., Wo, found by applying the Gauss-Seidel method to this matrix, 
are given in Table 12.1. 
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These answers are exact, because the true solution, u(x, y) = 400xy, has 


atu atu 
eS 
ax* ay 
and the truncation error is zero at each step. a 


The problem we considered in Example | has the same mesh size, 0.125, on each axis 
and requires solving only a 9 x 9 linear system. This simplifies the situation and does not 
introduce the computational problems that are present when the system is larger. Algorithm 
12.1 uses the Gauss-Seidel iterative method for solving the linear system that is produced 
and permits unequal mesh sizes on the axes. 


Poisson Equation Finite-Difference 


To approximate the solution to the Poisson equation 
a? a? 
<“@y+—@y) =fery, axx<b, c<y<d, 
Ox? ay? 


subject to the boundary conditions 


u(x,y)=g(,y) ifx=aorx=b and c<y<d 


and 


u(x,y)=g(,y) ify=cory=d and a<x<b: 


INPUT endpoints a, b,c, d; integers m > 3, n > 3; tolerance TOL; maximum number of 
iterations N. 


OUTPUT approximations w;; to u(x;, y;) for each i=1,...,n—1 and for eachj = 1,..., 
m — | or a message that the maximum number of iterations was exceeded. 


Step 1 Seth=(b—a)/n; 
k = (d—c)/m. 


Step 2 Fori=1,...,n—1setx;=a+ih. (Steps 2 and 3 construct mesh points.) 
Step 3 Forj=1,...,m—1sety; =c+ jk. 
Step 4 Fori=1,...,n—1 


forj = 1,...,m—1 set wij = 0. 
Step 5 Seta =h’*/k’; 

pe =2(1+A); 

l=1. 


Step 6 While / < N do Steps 7-20. (Steps 7-20 perform Gauss-Seidel iterations.) 


Step 7 Set z= (—h? f (x1, ¥m—1) + 8(GYm—-1) + Ag, d) + AWim—2 + W2m—1) / Ms 
NORM = |z = Wiyn-1]5 
Wim-1 = Z. 


Step 8 Fori=2,...,n—2 
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set z= (— A? f Xj, Ym—1) + AS (Kind) + Wit m—1 
+Wi+1m-1 + AWim—2) / M3 

if |Wim—1 — z| > NORM then set NORM = |wj.m—1 — Z|; 

set Wim—1 = Z- 


Step9 Set z= (— A? f &n—-1,¥m-1) + 80, Ym-1) + AB Gn—1,4) 
+Wn 2,m—1 + AWn lm 2)/Ms 
if |Wnr—1m—1 — Z| > NORM then set NORM = |Wy-1.m—1 — 213 
Set Wy—1m—1 = Z. 


Step 10 Forj=m-—2,...,2 do Steps 11, 12, and 13. 
Step 11 Setz = (—h’ fa, yj) + gay) + Awi js + Awiy-1 + way) /ms 
if |w1; — z| > NORM then set NORM = |w1, — z\; 
set Wy j = Z. 
Step 12 Fori=2,...,n—2 
set z = (—h? f (xi, 9) + wi-1y + Awije1 + Wi41y + AWiy-1) /HS 
if |wi; — z| > NORM then set NORM = |w;, — z\; 
set Wj = Z. 


Step 13 Setz = (—A? f (%-1,9j) + 8(b, yj) + Wn-2, 
+AWn-1 jt + AWa-1y—1)/ Ms 
if |W»—-1j — z| > NORM then set NORM = |wy_1,j — 1; 
Bet wWy4; =z. 
Step 14 Setz = (-A’ f(x1,y1) + g(a y1) + Ag(i.c) + AW12 + wr1) /Ms 
if |w1, — z| > NORM then set NORM = |w 1, — z|; 
set W141 = Z. 
Step 15 Fori=2,...,n—2 


set z= (—h’ f (xi, y1) + Ag@i.c) + Wii + Awi2 + Wisi) /Ms 
if |wi.1 — z| > NORM then set NORM = |w;,1 — z|; 
set Wi1 = Z. 


Step 16 Set z= (—h f n-1,y1) + 8(B,y1) + AgCn—1,.€) + Wa—2,1 + AWn-1,2) [MS 
if |W»—1,1 — z| > NORM then set NORM = |Wn-1,1 — Z|; 
set Wn—1,1 = Z. 


Step 17 If NORM < TOL then do Steps 18 and 19. 


Step 18 Fori=1,...,n—1 
for j = 1,...,m— 1 OUTPUT (x, yj, wi). 


Step 19 STOP. (The procedure was successful.) 
Step 20 Seti=/+1. 


Step 21 OUTPUT (‘Maximum number of iterations exceeded’); 
(The procedure was unsuccessful.) 
STOP. rT] 


Although the Gauss-Seidel iterative procedure is incorporated into Algorithm 12.1 for 
simplicity, it is advisable to use a direct technique such as Gaussian elimination when the 
system is small, on the order of 100 or less, because the positive definiteness ensures stability 
with respect to round-off errors. In particular, a generalization of the Crout Factorization 
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Algorithm 6.7 (see [Var1], p. 221), is efficient for solving this system because the matrix is 
in the symmetric-block tridiagonal form 


A, C, O peters eceenes 0 
Cr Ap, Co. 

0. Co, 

: “he : 2 Cn-1 
Qchesaateeadatin 0 oe 1 or 


with square blocks of size (n — 1) x (n— 1). 


Choice of Iterative Method 
For large systems, an iterative method should be used—specifically, the SOR method dis- 
cussed in Algorithm 7.3. The choice of @ that is optimal in this situation comes from the 
fact that when A is decomposed into its diagonal D and upper- and lower-triangular parts 
U and L, 

A=D-L-U, 
and B is the matrix for the Jacobi method, 


B=D"'(L+U), 


then the spectral radius of B is (see [Var1]) 


v= 1{cos(2) +<00(2)] 


The value of w to be used is, consequently, 


Taro Laem) 


A block technique can be incorporated into the algorithm for faster convergence of the SOR 
procedure. For a presentation of this technique, see [Varl], pp. 219-223. 


Use the Poisson finite-difference method with n = 6 , m = 5, and a tolerance of 107!° to 
approximate the solution to 


a7u a7u 7 
axt 9) F gyn HY) = 2s O0<x<2, O<y<l, 


with the boundary conditions 


u(0,y) =0, u(2,y)=2e, O<y<l, 


u(x,0) =x, u(lx,l)=ex, O<x <2, 


and compare the results with the exact solution u(x, y) = xe’. 
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Solution Using Algorithm 12.1 with a maximum number of iterations set at VN = 100 gives 
the results in Table 12.2. The stopping criterion for the Gauss-Seidel method in Step 17 
requires that 
0) (-1) —10 
Iwi — wy | <10°", 
foreachi = 1,...,5andj = 1,...,4. The solution to the difference equation was accurately 
obtained, and the procedure stopped at / = 61. The results, along with the correct values, 


are presented in Table 12.2. oO 
ij x; yj wih u(x, Yj) u(x, yj) — Wy 
1 1 0.3333 0.2000 0.40726 0.40713 1.30 x 10-4 
1 2 0.3333 0.4000 0.49748 0.49727 2.08 x 10-+ 
1 3 0.3333 0.6000 0.60760 0.60737 2.23 x 10-+ 
1 4 0.3333 0.8000 0.74201 0.74185 1.60 x 10-4 
2 1 0.6667 0.2000 0.81452 0.81427 2.55 x 107+ 
2 2 0.6667 0.4000 0.99496 0.99455 4.08 x 10-4 
2 3 0.6667 0.6000 1.2152 1.2147 4.37 x 10-4 
2 4 0.6667 0.8000 1.4840 1.4837 3.15 x 10-4 
3 1 1.0000 0.2000 1.2218 1.2214 3.64 x 10-4 
3 2 1.0000 0.4000 1.4924 1.4918 5.80 x 10-4 
3 3 1.0000 0.6000 1.8227 1.8221 6.24 x 1074 
3 4 1.0000 0.8000 2.2260 2.2255 4.51 x 10-4 
4 1 1.3333 0.2000 1.6290 1.6285 4.27 x 10-4 
4 2 1.3333 0.4000 1.9898 1.9891 6.79 x 10-+ 
4 3 1.3333 0.6000 2.4302 2.4295 7.35 x 107+ 
4 4 1.3333 0.8000 2.9679 2.9674 5.40 x 10-4 
5 1 1.6667 0.2000 2.0360 2.0357 3.71 x 10-+ 
5 2 1.6667 0.4000 2.4870 2.4864 5.84 x 10-4 
5 3 1.6667 0.6000 3.0375 3.0369 6.41 x 10-4 
5 4 1.6667 0.8000 3.7097 3.7092 4.89 x 10-4 


EXERCISE SET 12.1 


1. 


Use Algorithm 12.1 to approximate the solution to the elliptic partial differential equation 
au 07u 
ag tay —* O<x<l, 0<y<2; 

u(x,0) =x, u(x,2)=(x—2)?, O<x<1; 

u0,yy=y, uLy=O-1?, O<y<2. 


Useh=k= 5, and compare the results to the actual solution u(x, y) = (x — y)?. 
Use Algorithm 12.1 to approximate the solution to the elliptic partial differential equation 


HP ey 2, 0 1 
—_ +—_ = 0, <x <2, <y<l; 
ax? ay? * 
u(x,0) = 2Inx, u(x,1) =I? +1), 1<x<2; 


u(l,y)=InQ* +1), u(2,y)=InG?+4), O<y<1. 


Useh=k= i, and compare the results to the actual solution u(x, y) = InQ? + y’). 
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3. Approximate the solutions to the following elliptic partial differential equations, using Algorithm 
12.1: 


a —+—=0, O<x<l, O<y<l; 


u(x,0)=0, u(x, 1) =x, O<x<l; 
u(0,y)=0, u(l,y)=y, O<y<l. 
Use h = k = 0.2, and compare the results to the actual solution u(x, y) = xy. 
au Oru 


b. ae ae Oe O<x<a, O<y<a; 


aa 

u(O,y) =cosy, u(a,y) = —cosy, O<y< 7 
crm 

u(x,0) = cosx, u(x, 5) = 0, O<x<7. 


Use h = 1/5 and k = 7/10, and compare the results to the actual solution u(x, y) = cos x cos y. 


a7u a au (2+ yer 0 20 1 
Cc. —s) az =H C5 <x< 2, <y<l; 
ax? ay? d - 
uOy=1, u2Zy=e", O<y<1; 
ux,0=1, ual=e,~ O<x <2. 
Use h = 0.2 and k = 0.1, and compare the results to the actual solution u(x, y) = e”. 
d a7 u x ux ye y 1 > 1 > 
= : 2x2, <y<2; 
ax> Oye yp x - 
u(x, 1) =xInx, u(x,2) = xIn(4x’), 1<x<2; 
u(l,y)=ylny, u(2,y) = 2yIn(2y), l<y<2. 


Use h = k = 0.1, and compare the results to the actual solution u(x, y) = xy Inxy. 


4. Repeat Exercise 3(a) using extrapolation with ho = 0.2, hy = ho/2, and hy = ho/4. 

5. Construct an algorithm similar to Algorithm 12.1, except use the SOR method with optimal w instead 
of the Gauss-Seidel method for solving the linear system. 

6. Repeat Exercise 3 using the algorithm constructed in Exercise 5. 

7. A coaxial cable is made of a 0.1-in.-square inner conductor and a 0.5-in.-square outer conductor. 
The potential at a point in the cross section of the cable is described by Laplace’s equation. Suppose 
the inner conductor is kept at 0 volts and the outer conductor is kept at 110 volts. Find the potential 
between the two conductors by placing a grid with horizontal mesh spacing h = 0.1 in. and vertical 
mesh spacing k = 0.1 in. on the region 


D={(,y)|0< x,y < 0.5}. 


Approximate the solution to Laplace’s equation at each grid point, and use the two sets of boundary 
conditions to derive a linear system to be solved by the Gauss-Seidel method. 

8. A 6-cm by 5-cm rectangular silver plate has heat being uniformly generated at each point at the rate 
q = 1.5 cal/cm*-s. Let x represent the distance along the edge of the plate of length 6 cm and y be 
the distance along the edge of the plate of length 5 cm. Suppose the temperature u along the edges is 
kept at the following temperatures: 


u(x,0) = x(6—x), u(x,5)=0, O<x<6, 
u(0,y) = yS—y), u6,y)=0, O<yS5, 
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where the origin lies at a corner of the plate with coordinates (0, 0) and the edges lie along the positive 
x- and y-axes. The steady-state temperature u = u(x, y) satisfies Poisson’s equation: 

OG ie ay a. 2G 6, 0 5 

— (x, —(,y) =-<, <x<6,0<y<5, 

ae age 2 


where K, the thermal conductivity, is 1.04 cal/cm-deg-s. Approximate the temperature u(x, y) using 
Algorithm 12.1 with h = 0.4 andk = i 


12.2 Parabolic Partial Differential Equations 


The parabolic partial differential equation we consider is the heat, or diffusion, equation 


Ou 5 0°u 
“(x,t) =o? (7,1), O<x<l, t>0, (12.6) 
or Ox2 


subject to the conditions 
u(0,t) =ul,t) =0, t>0, and u(x,0)= ff), O<x<l. 


The approach we use to approximate the solution to this problem involves finite differences 
and is similar to the method used in Section 12.1. 

First select an integer m > 0 and define the x-axis step size h = |/m. Then select a time- 
step size k. The grid points for this situation are (x;, t;), where x; = ih, fori = 0,1,...,m, 
and t; = jk, forj = 0,1,.... 


Forward Difference Method 
We obtain the difference method using the Taylor series in ¢ to form the difference quotient 


u(X;, tj + k) = U(X, 7) k a7u 
k 2 ar? 


for some ju; € (¢;, 441), and the Taylor series in x to form the difference quotient 


ou 
ap i= (xi, Mj), (12.7) 


au ‘ u(x; +h, )) — 2u(x;,t)) tu; —h,t;) A? dtu 
Do VG) = 
axe nh 12 ax4 
where §; € (%j-1,%i+1)- 

The parabolic partial differential equation (12.6) implies that at interior gridpoints 
(x;,t;), foreach i = 1,2,...,m— 1 andj = 1,2,..., we have 


i. ti), (12.8) 


0 
0 


so the difference method using the difference quotients (12.7) and (12.8) is 


Uu au 
= (xi) = 5a tt) = 0, 


Wise — Wy 2 Witiy — 2wy + Way 


= 0, 12.9 
k ip (12.9) 
where wj; approximates u(xj, tf). 
The local truncation error for this difference equation is 
k Pu 2 h* d4u 
y=5 a7 (xj, Lj) — ot DD ax (&), tj). (12.10) 
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Solving Eq. (12.9) for wjj41 gives 


2a7k ak 
wijtt = | 1— Gp) i +a pp with + wi-1,j); (12.11) 


for each i = 1,2,...,m—1 andj = 1,2,.... 
So we have 


woo = fo), wio= fl), .--Wmo = fm). 
Then we generate the next t-row by 


wo,1 =u(0, t) = 0; 


2a7k Jk 
wi1 =| 1— —.—]wiot+a a (w20 + Woo); 


207k Jk 
w21 =| 1— —— J u20+a pe 3.0 + wio)s 


(Wm,0 + Win—2,0) 3 


207k jk 
Wm-1,1 = 1- ae Wm—1,0 + & a 


Wim,1 =u(m, ty) = 0. 


Now we can use the w;,; values to generate all the w,2 values and so on. 
The explicit nature of the difference method implies that the (m — 1) x (m — 1) matrix 
associated with this system can be written in the tridiagonal form 


(ia: (leetnser dates 0 
ee 
Ocenia Oo % (1'— 2a) 


where A = a7(k/h’). If we let 
w® = (f(r), fG2),-- +f Gm—1))' 
and 
w” = (wij, W2j,-..,Wm—1j)', foreachj = 1,2,..., 
then the approximate solution is given by 
w) = Awl), for each j = 1,2,..., 


so wY is obtained from w¥~!) by a simple matrix multiplication. This is known as the 
Forward-Difference method, and the approximation at the cyan point shown in Figure 
12.8 uses information from the other points marked on that figure. If the solution to the 
partial differential equation has four continuous partial derivatives in x and two in f, then 
Eq. (12.10) implies that the method is of order O(k + h?). 
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Figure 12.8 


e) 

° 

o Forward- 
o difference 


© method 
e) 


(eo) 
ie) 


I 


Example 1 Use steps sizes (a) h = 0.1 and k = 0.0005 and (b) h = 0.1 and k = 0.01 to approximate 
the solution to the heat equation 


Le ae, th=0, 0 1, O<t 
ae 2 ar, =v, < <i, ee 
ot me Ox? oe ‘i 


with boundary conditions 
u(0,t) =u(1,t)=0, O<t, 
and initial conditions 
u(x,0) = sin(7x), O<x<l. 
Compare the results at t = 0.5 to the exact solution 


By oc 
u(x,t) =e” ‘sin(zx). 


Solution (a) Forward-Difference method with h = 0.1, k = 0.0005 and A = (1)?(0.0005 / 
(0.1)*) = 0.05 gives the results in the third column of Table 12.3. As can be seem from the 
fourth column, these results are quite accurate. 

(b) Forward-Difference method withh = 0.1,k = 0.01 and = (1)?(0.01/(0.1)7) = 
gives the results in the fifth column of Table 12.3. As can be seem from the sixth column, 
these results are worthless. | 


Stability Considerations 


A truncation error of order O(k + h?) is expected in Example 1. Although this is obtained 
with h = 0.1 and k = 0.0005, it certainly is not obtained when h = 0.1 and k = 0.01. To 
explain the difficulty, we need to look at the stability of the Forward-Difference method. 
0) — (,0 0 (0) \f ; : : ere 
Suppose that an error e’ = (e; ea 3.0053 2,4) is made in representing the initial 


data 


w = (f (x1), f00),---5 fOm—1)) 


(or in any particular step, the choice of the initial step is simply for convenience). An error 
of Ae propagates in w'”, because 
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Table 12.3 

Wi,1000 Wi,50 
xX; u(x;, 0.5) k = 0.0005 |u(x;, 0.5) — wi.1o90| k=0.01 |u(x;, 0.5) — wiso| 
0.0 0 0 0 
0.1 0.00222241 0.00228652 6.411 x 107° 8.19876 x 107 8.199 x 107 
0.2 0.00422728 0.00434922 1.219 x 10-4 —1.55719 x 108 1.557 x 108 
0.3 0.00581836 0.00598619 1.678 x 107+ 2.13833 x 108 2.138 x 108 
0.4 0.00683989 0.00703719 1.973 x 10-4 —2.50642 x 108 2.506 x 108 
0.5 0.00719188 0.00739934 2.075 x 10-4 2.62685 x 108 2.627 x 108 
0.6 0.00683989 0.00703719 1.973 x 107+ —2.49015 x 108 2.490 x 108 
0.7 0.00581836 0.00598619 1.678 x 10-4 2.11200 x 108 2.112 x 108 
0.8 0.00422728 0.00434922 1.219 x 10-4 —1.53086 x 108 1.531 x 108 
0.9 0.00222241 0.00228652 6.511 x 10> 8.03604 x 107 8.036 x 10’ 
1.0 0 0 0 


wi) = A(w® +e) = Aw + Ae, 


This process continues. At the nth time step, the error in w” due to e© is A”e. The 
method is consequently stable precisely when these errors do not grow as n increases. But 
this is true if and only if for any initial error e, we have |A"e | = le | for all n. 
Hence, we must have ||A”|| < 1, a condition that, by Theorem 7.15 on page 446, requires 
that p(A”) = (p(A))” < 1. The Forward-Difference method is therefore stable only if 
p(A) <1. 

The eigenvalues of A can be shown (see Exercise 13) to be 


. 2 
jie 1 —42(sin (=) , foreach? =1,2,...,m—1. 
2m 


So the condition for stability consequently reduces to determining whether 


in \\* 
1- 4n(sin (=) 
2m 
and this simplifies to 


. it - A 
0 < A{ sin | — <-, foreachi=1,2,...,m—1. 
2m 2 


Stability requires that this inequality condition hold as h — 0, or, equivalently, as 


m — oo. The fact that 
_] 2 
lim sin (“*)| =i 
m—>0o 2am 


means that stability will occur only if0 <A < 


p(A) = max 


1<i<m—-1 


<1, 


I 
2 . 
By definition 4 = a?(k/h’), so this inequality requires that h and k be chosen so that 


k 2 1 
a*— <-, 
he ~ 2 
In Example | we have a” = 1, so this condition is satisfied when h = 0.1 and k = 0.0005. 


But when k was increased to 0.01 with no corresponding increase in h, the ratio was 
0.01 1 
=I1>-, 
(0.1)? 2 


and stability problems became immediately apparent and dramatic. 
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12.2 Parabolic Partial Differential Equations 729 


Consistent with the terminology of Chapter 5, we call the Forward-Difference method 
conditionally stable. The method converges to the solution of Eq. (12.6) with rate of 
convergence O(k + h?), provided 


oF 


PR 


= 


Nile 


and the required continuity conditions on the solution are met. (For a detailed proof of this 
fact, see [IK, pp. 502—505].) 


Backward-Difference Method 


To obtain a method that is unconditionally stable, we consider an implicit-difference 
method that results from using the backward-difference quotient for (du/dt)(x;, t;) in the 
form 


ou u(xj, tj) — Uj, t-1) ok a7u 


—(xj;,t)) = 
an ee k YZ 


(Xi, Lj), 


where ju; is in (4-1, 4;). Substituting this equation, together with Eq. (12.8) for 07u/dx?, 
into the partial differential equation gives 


uU(x;, t)) — UO, t-1) ae U(Xj41,tj)) — 2uQ%j, t)) + U1, 4) 


k ie 
_ lcs, ) gi Cr ) 
= 9 app Met) — © Fy oa Get)» 


for some &; € (%;-1,%j11). The Backward-Difference method that results is 


Wy — Wij 2 Werry — wy + Wig 
k he 


= 0, (12.12) 


for eachi = 1,2,...,m—1 andj = 1,2,.... 
The Backward-Difference method involves the mesh points (xj, t;-1), (xi-1,¢;), and 
(x41, t;) to approximate the value at (x;, t;), as illustrated in Figure 12.9. 


Figure 12.9 


° 
e) 

o Backward- 
o difference 
© method 


le) 
le) 
le) 


i 
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Since the boundary and initial conditions associated with the problem give information 
at the circled mesh points, the figure shows that no explicit procedures can be used to solve 
Eq. (12.12). Recall that in the Forward-Difference method (see Figure 12.10), approxima- 
tions at (xj-1, t-1), (%j, 4-1), and (%;+1, t;-1) were used to find the approximation at (xj, ¢;). 
So an explicit method could be used to find the approximations, based on the information 
from the initial and boundary conditions. 


Figure 12.10 


{e) 

° 

o Forward- 
o difference 
© method 


If we again let A denote the quantity a?(k/h”), the Backward-Difference method 
becomes 
(1 + 2A)wi — Awisiy — AWi-1j = Wij-1, 


for each i = 1,2,...,m— 1 andj = 1,2,.... Using the knowledge that w;9 = f(x;), for 
eachi= 1,2,...,m—land wy,j = wo; = 0, for eachj = 1,2,..., this difference method 
has the matrix representation: 


—xr : 2 why ee 
Re EE i Ww, w2j-1 
Oe i gO . = : ~ 203) 
: Pea vey o. Wm-1j Wm-1j-1 
@osaseteoess ees ame ee 


or AW!) = w-)), for each i = 1,2,.... 

Hence, we must now solve a linear system to obtain w) from w/—). However A > 0, so 
the matrix A is positive definite and strictly diagonally dominant, as well as being tridiagonal. 
We can consequently use either the Crout Factorization Algorithm 6.7 or the SOR Algorithm 
7.3 to solve this system. Algorithm 12.2 solves (12.13) using Crout factorization, which 
is acceptable unless m is large. In this algorithm we assume, for stopping purposes, that a 
bound is given for tf. 


Heat Equation Backward-Difference 


To approximate the solution to the parabolic partial differential equation 


a a 
“ap)-va,)=0, O<x<l, 0<t<T, 
at ax? 
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subject to the boundary conditions 
u(0,t) =u(l,t) =0, O<t<T, 
and the initial conditions 
u(x,0)= f@), O<x<I: 
INPUT endpoint /; maximum time 7; constant a; integers m > 3, N > 1. 
OUTPUT approximations w;,; to u(x;, t;) foreach i = 1,...,m—1landj =1,...,N. 


Step 1 Seth =I1/m; 
k=T/N; 
4 = a0?k/h’. 


Step 2 Fori=1,...,m—1setw;= f(ih). (Initial values.) 
(Steps 3-11 solve a tridiagonal linear system using Algorithm 6.7.) 


Step 3 Setl; =1+42a; 
uy = —A/h. 


Step 4 Fori=2,...,m—2set] =1+2A+4+ dAuj_1; 
uj = —A/Ij. 


Step 5 Setl,»-) = 1+2A+ Aup_r. 
Step 6 Forj=1,...,N do Steps 7-11. 


Step 7 Set t = jk; (Current t.) 
21 = wy/l. 


Step 8 Fori=2,...,m—1 set z= (w;+Az_1)/li. 
Step 9 Set wm—1 = Zm-1. 
Step 10 Fori=m-—2,...,1 set w; =z — ujwist. 


Step 11 OUTPUT (t); (Note: t = 1.) 
Fori=1,...,m—1setx =ih; 
OUTPUT (x, w;). (Note: w; = Wij.) 


Step 12 STOP. (The procedure is complete.) a 


Example 2 Use the Backward-Difference method (Algorithm 12.2) with h = 0.1 and k = 0.01 to 
approximate the solution to the heat equation 


a a? 
5 (0) 2 aot) =0, 0<x<l, 0<1, 
subject to the constraints 


u(0,t) =u(1,t)=0, O<t, u(v,0)=sinzx, O<x< 1. 


Solution This problem was considered in Example | where we found that choosing h = 0.1 
and k = 0.0005 gave quite accurate results. However,with the values in this example, 
h = 0.1 andk = 0.01, the results were exceptionally poor. To demonstrate the unconditional 
stability of the Backward-Difference method, we will use h = 0.1 and k = 0.01 and again 
compare w7;so to u(x;,0.5), where i = 0,1,..., 10. 
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Table 12.4 


L. E. Richardson, who we saw 
associated with extrapolation, 
did substantial work in 

the approximation of 
partial-differential equations. 


Numerical Solutions to Partial Differential Equations 


The results listed in Table 12.4 have the same values of and k as those in the fifth and 


sixth columns of Table 12.3, which illustrates the stability of this method. a 
xj Wi,s0 u(x;, 0.5) |wis0 — uj, 0.5)| 
0.0 0 0 

0.1 0.00289802 0.00222241 6.756 x 1074 
0.2 0.00551236 0.00422728 1.285 x 10-3 
0.3 0.00758711 0.00581836 1.769 x 10-3 
0.4 0.00891918 0.00683989 2.079 x 10-3 
0.5 0.00937818 0.00719188 2.186 x 1073 
0.6 0.00891918 0.00683989 2.079 x 10-3 
0.7 0.00758711 0.00581836 1.769 x 10-3 
0.8 0.00551236 0.00422728 1.285 x 10-3 
0.9 0.00289802 0.00222241 6.756 x 1074 
1.0 0 0 


The reason that the Backward-Difference method does not have the stability problems 
of the Forward-Difference method can be seen by analyzing the eigenvalues of the matrix 
A. For the Backward-Difference method (see Exercise 14), the eigenvalues are 


‘ 2 
w= 1449 sin (| , foreachi=1,2,...,m—1. 
2m 


Since 4 > 0, so we have pw; > | for alli = 1,2,...,m— 1. Since the eigenvalues of A7! 
are the reciprocals of those of A, the spectral radius of A~!, p(A~!) < 1. This implies that 
A7! is a convergent matrix. 

An error e© in the initial data produces an error (A~!)"e at the nth step of the 
Backward-Difference method. Since A~! is convergent, 

lim (A7!)"e = 0. 

n> Oo 
So the method is stable, independent of the choice of A = a?(k/h’). In the terminology 
of Chapter 5, we call the Backward-Difference method an unconditionally stable method. 
The local truncation error for the method is of order O(k + h?), provided the solution 
of the differential equation satisfies the usual differentiability conditions. In this case, the 
method converges to the solution of the partial differential equation with this same rate of 
convergence (see [IK], p. 508). 

The weakness of the Backward-Difference method results from the fact that the local 
truncation error has one of order O(h7), and another of order O(k). This requires that time 
intervals be made much smaller than the x-axis intervals. It would clearly be desirable to 
have a procedure with local truncation error of order O(k? + h?). The first step in this 
direction is to use a difference equation that has O(k?) error for u,(x, f) instead of those we 
have used previously, whose error was O(k). This can be done by using the Taylor series in 
t for the function u(x, t) at the point (x;, t;) and evaluating at (xj, t,41) and (xj, t;-1) to obtain 
the Centered-Difference formula 


U(xj,t41) — Uxj,t-1) kK? Bu 
2k 6 Or 


where fj; € (tj-1,¢)+1). The difference method that results from substituting this and the 
usual difference quotient for (07u/dx”), Eq. (12.8), into the differential equation is called 
Richardson’s method and is given by 


ou 
a Oe = (Xi, Lj), 


Wij+1 — Wij-1 ee Witiy — 2Wy + wWi-1j 
2k h? 


=0. (12.14) 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Following work as a 
mathematical physicist during 
World War II, John Crank 
(1916-2006) did research in the 
numerical solution of partial 
differential equations; in 
particular, heat-conduction 
problems. The Crank-Nicolson 
method is based on work done 
with Phyllis Nicolson 
(1917-1968), a physicist at Leeds 
University. Their original paper 
on the method appeared in 1947 
[CN]. 
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This method has local truncation error of order O(k* + h?), but unfortunately, like the 
Forward-Difference method, it has serious stability problems (see Exercises 11 and 12). 


Crank-Nicolson Method 


A more rewarding method is derived by averaging the Forward-Difference method at the 
jth step in f, 


Ope as 2wij + wi-1j 


k he =O 


which has local truncation error 


k 07u 


FS 2 ar 


(xi, uj) + OCH), 


and the Backward-Difference method at the (j + 1)st step in 1, 


Wi jt — Wij M2 withitl — 2Wija + Wi Lj+l 


k he = 


which has local truncation error 


kau, 2 
t= — 5 92 Mir i) + O(h*). 
If we assume that 
a7u a2 


7 u 
9p x fj) © ap Me Hy), 


then the averaged-difference method, 


h h? 


2 
Wij — Wy = O° | Wit y — 2Wiy + Wi-1y 
k 2 


Wit jel — 2Wije + Wi-1 y+ 
+ = 0, 


has local truncation error of order O(k? + h?), provided, of course, that the usual differen- 
tiability conditions are satisfied. 
This is known as the Crank-Nicolson method and is represented in the matrix form 


Aw!t) = Bw”, for each j = 0,1,2,..., (12.15) 
where 
A= Car w?) = (Wij, W2j,-++)Wm—1j)'s 


and the matrices A and B are given by: 


(1 +A) —4 Oraaceeue ts r) 
-A ” . 
A=] Qe IG 
(etoaiseaed “0 -} (1+A) 
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and 
( —h) x Oiserceewass 0 
AT 
2°... 
B= 0, _0 
a . " 
Oiacdiaes bes 0 x (l—A) 


The nonsingular matrix A is positive definite, strictly diagonally dominant, and tridi- 
agonal matrix. Either the Crout Factorization 6.7 or the SOR Algorithm 7.3 can be used 
to obtain w) from wI—!), for each j = 0,1,2,.... Algorithm 12.3 incorporates Crout fac- 
torization into the Crank-Nicolson technique. As in Algorithm 12.2, a finite length for the 
time interval must be specified to determine a stopping procedure. The verification that the 
Crank-Nicolson method is unconditionally stable and has order of convergence O(k* + h”) 
can be found in [IK], pp. 508-512. A diagram showing the interaction of the nodes for 
determining an approximation at (x;, t;) is shown in Figure 12.11. 


Figure 12.11 


fe) 

e} 

o Crank- 

© Nicolson 
© method 


ie) 


Crank-Nicolson 


To approximate the solution to the parabolic partial differential equation 


Ms.) 0 M6, =0, OLenl O<raT, 
subject to the boundary conditions 
u(0,t) =udi,t) =0, O<t<T, 
and the initial conditions 
u(x,0)= f@), OK<xK<l: 


INPUT endpoint /; maximum time 7; constant a; integers m > 3, N > 1. 


OUTPUT approximations w;; to u(x;, t;) foreachi = 1,...,m —landj=1,...,N. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


12.2 Parabolic Partial Differential Equations 735 


Step 1 Seth =I1/m; 


k=T/N; 
A= a?k/h?; 
Wn = 


Step 2 Fori=1,...,m—1setw;= f(ih). (Initial values.) 
(Steps 3-11 solve a tridiagonal linear system using Algorithm 6.7.) 


Step 3 Seti, =1+4A; 
uy = —1/(2h). 


Step 4 Fori=2,...,m—2setl,=1+A+Auj;_1/2; 
uy = —1/(2I;). 


Step 5 Setly»-) =1+A+ Um —2/2. 
Step 6 Forj=1,...,N do Steps 7-11. 


Step 7 Sett=jk; (Current t,.) 


y= la —X)w,+ 3| /». 


Step 8 Fori=2,...,m— 1 set 
Xx 
a= la —))wi + 3 (wits + weit «| Jv 


Step 9 Set wy_1 = Zm-1- 
Step 10 Fori=m-—2,...,1 set w; = 7% — ujwi4t. 


Step 11 OUTPUT (1); (Note: t = .) 
Fori=1,...,m—1setx =ih; 
OUTPUT (x, Wj). (Note: Wi= Wij.) 


Step 12 STOP. (The procedure is complete.) a 


Example 3 Use the Crank-Nicolson method with 4 = 0.1 and k = 0.01 to approximate the solution 
to the problem 


3 9 
5 (1) — aint) = 0, 02e21 Mer 


subject to the conditions 
u(0,t) =u.) =0, O<t, 
and 


u(x,0) =sin(z7x), O<x<l. 


Solution Choosing h = 0.1 and k = 0.01 gives m = 10, N = 50, and 4 = 1 in Algorithm 
12.3. Recall that the Forward-Difference method gave dramatically poor results for this 
choice of h and k, but the Backward-Difference method gave results that were accurate to 
about 2 x 1073 for entries in the middle of the table. The results in Table 12.5 indicate the 
increase in accuracy of the Crank-Nicolson method over the Backward-Difference method, 
the best of the two previously discussed techniques. a 
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Table 12.5 


Numerical Solutions to Partial Differential Equations 


Wi,50 u(x;, 0.5) |Wiso — u(x, 0.5)| 

0.0 0 0 

0.1 0.00230512 0.00222241 8.271 x 10-> 
0.2 0.00438461 0.00422728 1.573 x 1074 
0.3 0.00603489 0.00581836 2.165 x 107+ 
0.4 0.00709444 0.00683989 2.546 x 107+ 
0.5 0.00745954 0.00719188 2.677 x 10-4 
0.6 0.00709444 0.00683989 2.546 x 10-4 
0.7 0.00603489 0.00581836 2.165 x 107+ 
0.8 0.00438461 0.00422728 1.573 x 10-4 
0.9 0.00230512 0.00222241 8.271 x 10-> 
1.0 0 0 


EXERCISE SET 12.2 


1. 


> 


Approximate the solution to the following partial differential equation using the Backward-Difference 
method. 


—=0, 0<x<2,0<tr; 


u(0,t) = u(2,t1) =0, O<4, u(x,0) = sin x, 0<x <2. 


Use m = 4, T = 0.1, and N = 2, and compare your results to the actual solution u(x,t) = 
eo PP /4 gin Ex. 
Approximate the solution to the following partial differential equation using the Backward-Difference 
method. 
ou 1 07u 
at 16 ax2 
u(0,t) =u(l,t)=0, O<t, u(x,0)=2sin27x, O<x< 1. 


0, O<x<1,0<t; 


Use m = 3, T = 0.1, and N = 2, and compare your results to the actual solution u(x,t) = 
Qe-@*/4)" sin Iorx. 
Repeat Exercise 1 using the Crank-Nicolson Algorithm. 
Repeat Exercise 2 using the Crank-Nicolson Algorithm. 
Use the Forward-Difference method to approximate the solution to the following parabolic partial 
differential equations. 


2 
a. OO 0<x<2,0<t; 


u(0,t) = u(2,t) = 0, O<t, 
u(x,0) = sin2amx, O<x <2. 


Use h = 0.4 and k = 0.1, and compare your results at t = 0.5 to the actual solution u(x, ft) = 

e~*"' sin 27x. Then use h = 0.4 and k = 0.05, and compare the answers. 

duu 

at axe 
u(0,t) = u(z,t)=0, O<t, 


=0, O<x<7,0<t; 


u(x,0) =sinx, O<x<7Z. 


Use h = x/10 and k = 0.05, and compare your results at f = 0.5 to the actual solution 
u(x,t) = e~' sinx. 
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6. Use the Forward-Difference method to approximate the solution to the following parabolic partial 
differential equations. 


a —-3—~-=0, 0<x<4,0<8 


u(0,t) = u(4,t)=0, O<t, 


u(x, 0) = sin tx(1 +2cos x), O0<x<4. 
Use h = 0.2 and k = 0.04, and compare your results at t = 0.4 to the actual solution u(x,t) = 
etsin Fee" sin Fx, 
ou 1 0°u 
arm? ax? 


u(0,t) =u(1,t) =0, O<t, 


=0, O<x<1,0<t; 


u(x, 0) = cos x ( — 4), O<x<l. 


Use h = 0.1 and k = 0.04, and compare your results at tf = 0.4 to the actual solution u(x,t) = 
e'cosm(x — 5). 
7. Repeat Exercise 5 using the Backward-Difference Algorithm. 
8. Repeat Exercise 6 using the Backward-Difference Algorithm. 
9. Repeat Exercise 5 using the Crank-Nicolson Algorithm. 
10. Repeat Exercise 6 using the Crank-Nicolson Algorithm. 
11. Repeat Exercise 5 using Richardson’s method. 
12. Repeat Exercise 6 using Richardson’s method. 


13. Show that the eigenvalues for the (m — 1) by (m — 1) tridiagonal method matrix A given by 


A, jJ=i-lorj=it+l, 
ay;=41—-2, j=i, 
0, otherwise 
are 
1 \2 
. UU : 
M=1-40 (sin =) , foreachi=1,2,...,m—1, 
2m 
with corresponding eigenvectors v”, where vy? = sin(ijz/m). 


14. Show that the (m — 1) by (m — 1) tridiagonal method matrix A given by 


—A, jJ=i-lorj=it+l, 
a4j=41+240, j=i, 


0, otherwise, 


where A > 0, is positive definite and diagonally dominant and has eigenvalues 


fi RD 
M=14+40 (sin =| , foreachi=1,2,...,m—1, 
2m 
with corresponding eigenvectors v”, where y" = sin(ijz/m). 

15. Modify Algorithms 12.2 and 12.3 to include the parabolic partial differential equation 
du 07u 
ot = 0x? 

u(0,t) =u(l,t) =0, O<t; 


=F(x), O<x<I1,0<t; 


u(x,0) = fix), O<x<1. 
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16. Use the results of Exercise 15 to approximate the solution to 
duu 

ot dx? 

u(0,t) =u.) =0, O<t; 


=2, O0<x<10<tf; 


u(x, 0) = sinax + x(1 — x), 


with h = 0.1 and k = 0.01. Compare your answer at tf = 0.25 to the actual solution u(x,t) = 
e" ' sinax +x(1 —x). 
17. Change Algorithms 12.2 and 12.3 to accommodate the partial differential equation 


—-a 0, O<x<1,0<t,; 


u(0,f) = o(t), ud, )) = WV), O<t; 
u(x,0)= f(x), O<x<l, 


where f(0) = #(0) and f(/J) = Y(0). 

18. The temperature u(x,t) of a long, thin rod of constant cross section and homogeneous conducting 
material is governed by the one-dimensional heat equation. If heat is generated in the material, for 
example, by resistance to current or nuclear reaction, the heat equation becomes 


au Kr ou 
—4+—=kK—, O<x<l, O<t, 
ax? pC ot 


where / is the length, p is the density, C is the specific heat, and K is the thermal diffusivity of the 
rod. The function r = r(x, t,u) represents the heat generated per unit volume. Suppose that 


1=1.5cm, K=1.04cal/em-deg-s, po =10.6 g/cm’, C= 0.056 cal/g - deg, 
and 
r(x,t,u) = 5.0 cal/em? - s. 

If the ends of the rod are kept at 0°C, then 

u(0,t) =u(l,t) =0, t>0. 
Suppose the initial temperature distribution is given by 

_ WX 
u(x, 0) = sin TT O<x<l. 


Use the results of Exercise 15 to approximate the temperature distribution with h = 0.15 and k = 
0.0225. 


19. Sagar and Payne [SP] analyze the stress-strain relationships and material properties of a cylinder 
alternately subjected to heating and cooling and consider the equation 

aT x 10T 10T 1 

dr? r dr = 4K Ot’ 2 


<r<1l,0<T, 


where T = T(r, f) is the temperature, r is the radial distance from the center of the cylinder, t is time, 
and K is a diffusivity coefficient. 


a. Find approximations to T(r, 10) for acylinder with outside radius 1, given the initial and boundary 
conditions: 


1 
T(1,t) = 100 + 40r, r(S.)=n 0<1r<10; 


T(r,0) = 200(r — 0.5), 0.5<r<l. 


Use a modification of the Backward-Difference method with K = 0.1, k=0.5, and h=Ar=0.1. 
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b. Use the temperature distribution of part (a) to calculate the strain J by approximating the integral 


1 
r= [ aT(r,t)r dr, 
0 


Fs] 


where a = 10.7 and t = 10. Use the Composite Trapezoidal method with n = 5. 


| 12.3 Hyperbolic Partial Differential Equations 


In this section, we consider the numerical solution to the wave equation, an example of 
a hyperbolic partial differential equation. The wave equation is given by the differential 
equation 


Q2 
(x,t) — 0 = (x,f))=0, O<x<1, t>0, (12.16) 


subject to the conditions 


u(0,t) = u(l,t) =0, for t>0, 
Ou 
u(x,0) = f(x), and ae 0)=g(x), for 0<x<I, 


where «@ is a constant dependent on the physical conditions of the problem. 
Select an integer m > 0 to define the x-axis grid points using h = //m. In addition, 
select a time-step size k > 0. The mesh points (xj, ¢;) are defined by 


Xi = ih and tj = jk, 


for eachi=0,1,...,mandj =0,1,.... 
At any interior mesh point (x;, ¢;), the wave equation becomes 
07u 
ar? 


The difference method is obtained using the centered-difference quotient for the second 
partial derivatives given by 


au 
(xi, t;) — ar? ae (xi, t;) = 0. (2.175 


a7u U(X), t.1) — 2u(xj,t;) + u(xj,t}_1) kk? atu 
Za i ti) = a 2 : = 7 Xin Mi), 
ot k 12 of 

where 4; € (¢)-1, tj41), and 
a7u u(Xji415t;) — 2u(x;,t;) Hu(xj_1,t;))  h? dtu 
= uit) = a ad 7 (Eis), 
Ox h 12 ox 


where &; € (%j-1, X41). Substituting these into Eq. (12.17) gives 


U(X, G41) — 2uQ%j, tj) + ui, 1) “i u(xi+1, tj) — 2u(xi, §) + uQi-1, G) 
k2 h? 


1 atu atu 
= le (x;, uj) — eh? ax i, ‘| 


Neglecting the error term 


17 ,,04u > 9 dtu 
Ti = ak ae (xj, Wj) — avh 54 (Sis) : (12.18) 
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Figure 12.12 


Numerical Solutions to Partial Differential Equations 


leads to the difference equation 


Wijtl — 2Wijy + Wij-1 og? wi ~ 2wij + Wi-1j 
k2 h 


Define 4 = ak/h. Then we can write the difference equation as 


=0. 


wigs — 2wij + wigt — wing + WW? wij — wig = 0 
and solve for w;j;+1, the most advanced time-step approximation, to obtain 
wigs = 211 — A? )wig +? (wip + Wi-ty) — Wij. (12.19) 


This equation holds for eachi = 1,2,...,m—1andj = 1,2,....The boundary conditions 
give 


Woj = Wnmj =9, foreachj = 1,2,3,..., (12.20) 
and the initial condition implies that 
wio = f(x), foreachi=1,2,...,m— 1. (12.21) 


Writing this set of equations in matrix form gives 


21 — A?) 2 Opeseeeeeeee 0 
Wi j+l Ne 2d =) i? ed Wij Wij-1 
W2j41 be — es me ; W2 W2j-1 
: ee tg 
Wm-1j+1 : be eae, Beg Wm-1j Wm-1j-1 
Osisavsaeeeres: wee OA? 21> A?) 
(12.22) 


Equations (12.18) and (12.19) imply that the (j + 1)st time step requires values from the 
jth and Gj — 1)st time steps. (See Figure 12.12.) This produces a minor starting problem 
because values for j = 0 are given by Eq. (12.20), but values for j = 1, which are needed 
in Eq. (12.18) to compute w;2, must be obtained from the initial-velocity condition 


0 
5 0) = g(x), O<x<1. 


ie) 
ie) 
ie) 
ie) 
ie) 
ie) 
ie) 
ie) 


~ 
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One approach is to replace du/dt by a forward-difference approximation, 


2 

(0) = NS Gin fi), (12.23) 

for some (4; in (0, t,). Solving for u(x;, t;) in the equation gives 
k2 O2u 
u(x;,t,) = u(x;,0) + ce “(8is0) + — 7 o2 = Gri 
kK? au 
= u(x, 0) + kg) + > 5 (oi ft. 

Deleting the truncation term gives the approximation, 

Wit = Wio +kg(x), foreachi=1,...,m—1. (12.24) 


However, this approximation has truncation error of only O(k) whereas the truncation error 
in Eq. (12.19) is O(k?). 


Improving the Initial Approximation 


To obtain a better approximation to u(x;, 0), expand u(x;, t;) ina second Maclaurin polyno- 
mial in ¢. Then 


2 92 3 93 y 


k* Oru 
uth) = uC, 0) + koe = (41,0) + (41,0) + Foe 


7 an = (x;, (Li), 


for some ji; in (0, t1). If f” exists, then 


du 2 0°u df 2" 
ap M9) =a aq2 M9) =a Te =a’ f' (xi) 


and 


3 93 


a?k i k°? o°u 
u(x, t1) = u(x;,0) + kg (xj) + 7 (xj) + > 


6 at 43 (Aig itl 


This produces an approximation with error O(k*): 
ak? 
wi = wio + kg (xi) + a FG: 


If fe C*[0, 1] but f(x) is not readily available, we can use the difference equation in 
Eq. (4.9) to write 


f@ie) —2fG)+f@i-) 4) (Z. 
72 at (&), 


Pep = 


for some E in (xj-1, Xi41). This implies that 
ka 2 
u(xis th) = u(xi,0) + kei) + UF Gin) — 2f i) + f+ O(k3 + h?k?). 
Because A = ka/h, we can write this as 


2 
u(x;, t1) = u(x;,0) + kg(xi) + = . 5 LF in) — 2G) + FO] ]+ OK? + Wk’) 


=(1-27 : ae ; at : . 3 272 
= (= @) FQ) + Ff Ginn) + FSG) + kee) + OOF +R). 
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Thus, the difference equation, 
Va Pa 
wig = 1A) fi) + yf G1) + 5 F@i-v + kg(xi), (12.25) 


can be used to find w;,, foreachi = 1,2,...,m—1. To determine subsequent approximates 
we use the system in (12.22). 

Algorithm 12.4 uses Eq. (12.25) to approximate w7;,;, although Eq. (12.24) could also 
be used. It is assumed that there is an upper bound for the value of f to be used in the 
stopping technique, and that k = T/N, where N is also given. 


Wave Equation Finite-Difference 


To approximate the solution to the wave equation 
a7u 
or 


subject to the boundary conditions 


a? 
(1) - oP? 5, = 0, 62421. Here? 


u(0,t) =u(,t)=0, O<t<T, 
and the initial conditions 
u(x,0) = f(x), and at 0)=g(x), for O<x<I], 
INPUT endpoint /; maximum time 7; constant a; integers m > 2, N > 2. 
OUTPUT approximations w;; to u(x;, t;) for each i = 0,...,m andj =0,...,N. 
Step 1 Seth=1/m,; 


k=T/N; 
A = ka/h. 
Step 2 Forj=1,...,N set wo; = 0; 


Wmj = 95 
Step 3 Set woo = f (0); 
Wmo = fl). 


Step 4 Fori=1,...,m—1 (Initialize fort = 0 andt =k.) 


set wig = f (ih); 
2 


Xr 
wi = (1— A?) f (ih) + a LACE + DA) + FE — Dh) + keh). 


Step 5 Forj=1,...,N—1 (Perform matrix multiplication.) 
fori=1,...,m—1 
set Wij41 = 211 — A?) wij + A? (Wig1y + Wi-1y) — Wij-1- 


Step 6 Forj=0,...,N 


set t = jk; 
fori=0,...,m 
set x = th; 


OUTPUT (x, t, w;,). 


Step 7 STOP. (The procedure is complete.) a 
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Example 1 

Table 12.6 

Xj Wi,20 

0.0 0.0000000000 
0.1 0.3090169944 
0.2 0.5877852523 
0.3 0.8090169944 
0.4 0.95 10565163 
0.5 1.0000000000 
0.6 0.95 10565163 
0.7 0.8090169944 
0.8 0.5877852523 
0.9 0.3090169944 
1.0 0.0000000000 
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Approximate the solution to the hyperbolic problem 
ep as 20, 1, O<t 
— (x,t) — 4—(,t) = 0, <x<l, <t, 
ar? ax? 
with boundary conditions 
u(0,t) =u(1,t)=0, for O<t, 
and initial conditions 
: ou 
u(x,0) = sin(x), O<x<1, and a =0, O<x<l, 


using h = 0.1 and k = 0.05. Compare the results with the exact solution 


u(x,t) = sinax cos 2zt. 


Solution Choosing h = 0.1 and k = 0.05 gives A = 1, m = 10, and N = 20. We will 
choose a maximum time T = | and apply the Finite-Difference Algorithm 12.4. This 
produces the approximations w;,y to u(0.1i, 1) fori = 0, 1,..., 10. These results are shown 
in Table 12.6 and are correct to the places given. a 


The results of the example were very accurate, more so than the truncation error O(k? + 
h*) would lead us to believe. This is because the true solution to the equation is infinitely 
differentiable. When this is the case, Taylor series gives 


U(Xj415 tj) — 2u(%j, t)) + ui-1, f) 
2 


a7u I du ht a®u 
= gz Mir ti) +2 Feet + 6! ax6 


O08) + | 


and 


U(Xj, ti1) — 2u(%j, t)) + ui, H-1) 
2 


a7u 2 dtu nt au 
= op Ceti) +2 a 9a Fe) + Gop Hit) os ere 


Since u(x, t) satisfies the partial differential equation, 


U(Xj, t41) — 2U(X%j, ti) + UG, t-1) a U(Xi41, t)) — 2UQX%j, t)) + UG-1, ft) 


k2 he 
1 (.,04u 13 dtu 
= 25 ¢ a4 (xj, t)) avh ax4 (xj, tj) 
1 a°u a°u 
+6 (« 976 (xj, tj) a*ht a (xi, )) a +} (12.26) 


However, differentiating the wave equation gives 


2 2 
o-u Uu 
242 2 442 
=ak x2 E a2 ox.0)| =avk a4 (xj, tj), 
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and we see that since A” = (a@?k*/h”) = 1, we have 
4 4 2 4 
; ie (x;, )) — 02h? sat oon) = 7 [a2k2 ce (x;,)) = 0. 
Continuing in this manner, all the terms on the right-hand side of (12.26) are 0, implying 
that the local truncation error is 0. The only errors in Example 1 are those due to the 
approximation of w;, and to round-off. 

As in the case of the Forward-Difference method for the heat equation, the Explicit 
Finite-Difference method for the wave equation has stability problems. In fact, itis necessary 
that 7X = ak/h < 1 for the method to be stable. (See [IK], p. 489.) The explicit method 
given in Algorithm 12.4, with A < 1, is O(n? + k*) convergent if f and g are sufficiently 
differentiable. For verification of this, see [IK], p. 491. 

Although we will not discuss them, there are implicit methods that are unconditionally 
stable. A discussion of these methods can be found in [Am], p. 199, [Mi], or [Sm,G]. 


EXERCISE SET 123 


1. Approximate the solution to the wave equation 
Vu 02 
Ot? = ax? 
u(0,t) =u(l,t)=0, O<t, 


=0, O<x<l, O<t; 


u(x,0) =sinzx, O<x<1, 
ou 
—,0)=0, Ox<x<1, 
ot 


using the Finite-Difference Algorithm 12.4 with m = 4, N = 4, and T = 1.0. Compare your results 
at t = 1.0 to the actual solution u(x, tf) = cos zt sin mx. 
2. Approximate the solution to the wave equation 
au 1 0*u 
Ot? = 167? Ox? 


u(0,t) = u(0.5,t) =0, O<t, 


=0, 0<x<05,0<t; 


u(x,0)=0, O<x<0.5, 


a 
5, 0) =sindnx, 0<x<05, 
using the Finite-Difference Algorithm 12.4 with m = 4, N = 4 and T = 0.5. Compare your results 
at t = 0.5 to the actual solution u(x, ft) = sint sin 47x. 
3. Approximate the solution to the wave equation 
uu 
at? = ax? 
u(0,t) =u(z,t)=0, OK<t, 


=0, O<x<7,0<t; 


u(x,0) =sinx, O<x<zZ, 


using the Finite-Difference Algorithm with h = 2/10 and k = 0.05, with h = 2/20 andk = 0.1, 
and then with h = 2/20 and k = 0.05. Compare your results at t = 0.5 to the actual solution 


u(x,t) = cost sinx. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


12.3 Hyperbolic Partial Differential Equations 745 


4. Repeat Exercise 3, using in Step 4 of Algorithm 12.4 the approximation 
Wit = Wio t+ kg(x;), foreachi=1,...,m—1. 
5. Approximate the solution to the wave equation 


au = anu 
are ax? 
u(0,t) =u(1,t)=0, 0 <t, 


=0, O<x<1,0<tf; 


u(x,0) =sin27x, O<x<l, 
ou ? 
apne =2msin2mx, O<x<1l, 


using Algorithm 12.4 with h = 0.1 and k = 0.1. Compare your results at t = 0.3 to the actual solution 
u(x,t) = sin 27x(cos 27t + sin2z?f). 


6. Approximate the solution to the wave equation 


uo 02u 
—--.—~=0, 0<x<1,0<f; 
are ax? 


u(0,t) =u(1,t) =0, 0<¢, 


1 
u(x, 0) = a Osx 


using Algorithm 12.4 with h = 0.1 and k = 0.1. 
7. The air pressure p(x, t) in an organ pipe is governed by the wave equation 
ap 1 ap 
Ss Eesaa IOKx<10<t, 
a 2 a . 


where / is the length of the pipe, and c is a physical constant. If the pipe is open, the boundary 
conditions are given by 


pO,t)=po and pi(l,t) = po. 


If the pipe is closed at the end where x = /, the boundary conditions are 
0 
p0,t)=po and 2,1) =0. 
Ox 
Assume that c = 1, / = 1, and the initial conditions are 
dp 
P(x, 0) = pocos2mx, and a 0)=0, O<x<l. 


a. Approximate the pressure for an open pipe with pp = 0.9 atx = $ for t = 0.5 and t = 1, using 
Algorithm 12.4 with h = k = 0.1. 


b. Modify Algorithm 12.4 for the closed-pipe problem with po = 0.9, and approximate p(0.5, 0.5) 
and p(0.5,1) usngh =k = 0.1. 


8. In an electric transmission line of length / that carries alternating current of high frequency (called a 
“lossless" line), the voltage V and current i are described by 


a°V av 

> =LC—, 0<x<Il,0<t; 
Ox? or 
ual io" 0 l0<t 
— =LC—, <x<l,0<t; 
Ox? at? . 
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where L is the inductance per unit length, and C is the capacitance per unit length. Suppose the line 
is 200 ft long and the constants C and L are given by 


C = 0.1 farads/ft and L = 0.3 henries/ft. 
Suppose the voltage and current also satisfy 
V(0,t) = V(200,7) =0, O<zt; 


V(x,0)=110sin =~, 0<x< 200; 
x,0) = si 550° <x< ; 


av 
—(x,0)=0, O0<-x < 200; 
or 
i(0, t) = 1(200,t) = 0, O<t; 


TUX 
i(x,0) = 5.5cos ——-, 0<-x < 200; 
200 
and 
di 
ate =0, O<x < 200. 


Approximate the voltage and current at t = 0.2 and ¢t = 0.5 using Algorithm 12.4 with h = 10 and 
k=0.1. 


| Si 12.4 An Introduction to the Finite-Element Method 


Finite elements began in the The Finite-Element method is similar to the Rayleigh-Ritz method for approximating 
1950s in the aircraft industry. the solution to two-point boundary-value problems that was introduced in Section 11.5. 
Use of the techniques followeda Tt was originally developed for use in civil engineering, but it is now used for approx- 
paper by Turner, Clough, Martin, —_ jmating the solutions to partial differential equations that arise in all areas of applied 
and Topp [TCMT] that was mathematics. 
a One advantage the Finite-Element method has over finite-difference methods is the rel- 
ti ne 8 
oe ative ease with which the boundary conditions of the problem are handled. Many physical 
recourses that were not available Problems have boundary conditions involving derivatives and irregularly shaped boundaries. 
until the early 1970s, Boundary conditions of this type are difficult to handle using finite-difference techniques 
because each boundary condition involving a derivative must be approximated by a differ- 
ence quotient at the grid points, and irregular shaping of the boundary makes placing the 
grid points difficult. The Finite-Element method includes the boundary conditions as inte- 
grals in a functional that is being minimized, so the construction procedure is independent 
of the particular boundary conditions of the problem. 
In our discussion, we consider the partial differential equation 


required large computer 


0 0 0 0 
7 (vo. x) ia 5 (ae95") + r(xy)u(x.y) = fay), (12.27) 
x ox dy dy 


with (x,y) € D, where D is a plane region with boundary S. 
Boundary conditions of the form 


u(x, y) = g(x,y) (12.28) 
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are imposed on a portion, S,, of the boundary. On the remainder of the boundary, S2, the 
solution u(x, y) is required to satisfy 


Ou Ou 
P(X, Y= (%Y) cos 6; + q(x, ay y) COs #2 + gi (x, y)u(x, y) = go(x,y), (12.29) 


where 6; and 6 are the direction angles of the outward normal to the boundary at the point 
(x, y). (See Figure 12.13.) 


Figure 12.13 


Tangent line 


Normal line 


Physical problems in the areas of solid mechanics and elasticity have associated partial 
differential equations similar to Eq. (12.26). The solution to a problem of this type typically 
minimizes a certain functional, involving integrals, over a class of functions determined by 
the problem. 

Suppose p,g,r, and f are all continuous on D US, p and qg have continuous first partial 
derivatives, and g; and go are continuous on S. Suppose, in addition, that p(x, y) > 0, 
q(x, y) > 0,r(@, y) < 0, and g;(x, y) > 0. Then a solution to Eq. (12.27) uniquely minimizes 
the functional 


1 dw? dw? ; 
I[w] = I {3[e%0(2) + any (@) —r(,y)w + fesyyul dx dy 
D x dy 
+f { eats») + seicsyu"} dS (12.30) 
S2 


over all twice continuously-differentiable functions w satisfying Eq. (12.28) on S;. The 
Finite-Element method approximates this solution by minimizing the functional J over a 
smaller class of functions, just as the Rayleigh-Ritz method did for the boundary-value 
problem considered in Section 11.5. 


Defining the Elements 


The first step is to divide the region into a finite number of sections, or elements, of a regular 
shape, either rectangles or triangles. (See Figure 12.14.) 

The set of functions used for approximation is generally a set of piecewise polynomials 
of fixed degree in x and y, and the approximation requires that the polynomials be pieced 
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Figure 12.14 


Numerical Solutions to Partial Differential Equations 


together in such a manner that the resulting function is continuous with an integrable or 
continuous first or second derivative on the entire region. Polynomials of linear type in x 
and y, 


O(x,y) =a+bx+cy, 


are commonly used with triangular elements, whereas polynomials of bilinear type in x 
and y, 


o(x,y) =a+bx+cy+dxy, 


are used with rectangular elements. 

Suppose that the region D has been subdivided into triangular elements. The collection 
of triangles is denoted D, and the vertices of these triangles are called nodes. The method 
seeks an approximation of the form 


m 


(x.y) = Do vibi(e,y), (12.31) 
i=1 


where ¢1,¢2,...,¢@m are linearly independent piecewise-linear polynomials, and 1, 
Y2,--+>Ym are constants. Some of these constants, for example, ¥y41, Ynt2,--->Ym> are 
used to ensure that the boundary condition, 

p(x, y) = gy), 


is satisfied on S,, and the remaining constants, y|,2,..., Yn, are used to minimize the 


functional J [S>7"_, vii]. 
Inserting the form of @(x, y) given in Eq. (12.31) for w in Eq. (12.30) produces 


m 


I[$] = i| >. ns 
i=1 
-/f (5 {0 ps why )] + 3 uy y} 
= x 5 PX Y L pee PtP q\X%y L ar X,Y 


m 2 m 
—r(@,y) > no(s)| fea), noite.) dy dx 
i=1 


i=l 


m 1 m 2 
+ j. — golx,y) D> vidilasy) + 506) rots») dS. (12.32) 
2 i=l i=l 
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Consider / as a function of |, 2,..., %:. For a minimum to occur we must have 
—=0, foreach j= 1,2,...,n 


Differentiating (12.32) gives 
al “. 8¢; ag; 
ies i L = 5 NZ Y) 


+ 4qQ, von 


oe By y) 


m 


— r(x,y) > Vii lx, V) Gj (X,y) + Foaae} dx dy 


i=1 


+f — g(x, y)bj(xy) + gi0.y) >> noilssv¥(0) dS, 
2 i=1 
SO 


->| ff {pcx ) i, y) “ee, y) + a(n 9) 9 PCa) aa y) 
- robin 009)| dx dy 


+f 81(x, y) pi (x, vy) P(X, y) as|y 


Sy 


+ / | Fey) bj (49) de dy — ; galx,yyGe.y) dS, 


S2 


for each j = 1,2,...,n. This set of equations can be written as a linear system: 
Ac = b, 


where ¢ = (7,..., Yn)‘, and where A = (a) and b = (1,..., Bn)’ are defined by 


ay = i [ co 2 Fa, oy ey) + a(n») a Haye od) (a9) 


- pe dx dy + i 81%, V)i(X, y) bj (x, y) dS, (12.33) 


for eachi = 1,2,...,n andj = 1,2,...,m, and 


m 


-- ff F(x, y) bi(%, y) dx dy +f go(x,y)bi(x.y) dS— Y> orev, (12.34) 


k=n+1 


for eachi = l,...,n. 
The particular choice of basis functions is important because the appropriate choice 
can often make the matrix A positive definite and banded. For the second-order problem 


(12.27), we assume that D is polygonal, so that D = D, and that S is a contiguous set of 
straight lines. 
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Illustration 


Figure 12.15 


Numerical Solutions to Partial Differential Equations 


Triangulating the Region 


To begin the procedure, we divide the region D into a collection of triangles T,, 7>,..., Ty, 
with the ith triangle having three vertices, or nodes, denoted 


vo = Geis). for j = 1,2,3. 


To simplify the notation, we write ve simply as V; = (x), yj) when working with the fixed 
triangle 7;. With each vertex V; we associate a linear polynomial 


1, ifj=k, 


a = a eee @ = 
N; (x,y) = Nj, y) = a; + bjx+cjy, where Ni’ OK, yn) = (: fj Ak 


This produces linear systems of the form 


1x y qj 0 
x2 = y2 b ij — 1 Fi 
1 x y3 Cj 


with the element | occurring in the jth row in the vector on the right (here j = 2). 

Let £|,...,£, bea labeling of the nodes lying in DUS. With each node Ex, we associate 
a function ¢, that is linear on each triangle, has the value | at E,, and is 0 at each of the 
other nodes. This choice makes ¢,; identical to NY on triangle T; when the node E; is the 


vertex denoted v,". 


Suppose that a finite-element problem contains the triangles T; and T, shown in Figure 
12.15. 
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The linear function NV ; 0 (x, y) that assumes the value | at (1, 1) and the value 0 at both (0, 0) 
and (—1, 2) satisfies 


ay? +b (1) +e}P (1) = 1, 


ay” + BY(-1) +.) 2) =0, 


and 


a + bY 0) + c (0) = 0. 


The solution to this system is ae =0, ce — =,and co) so 


= 
1 ~~ 3? 


N; Gy) = get 3° 


In a similar manner, the linear function NV. _ (x, y) that assumes the value | at (1, 1) and the 
value 0 at both (0, 0) and (1, 0) satisfies 


a +b) +e) = 1, 
ay” + BY (0) + <) 0) = 0, 
and 


ay? + BY (1) + ef) = 0. 


This implies that a = 0, bY = 0, and of) = 1. As aconsequence, N 2 (x,y) = y. Note 
that V Hees y)=N 0 (x, y) on the common boundary of T; and T, because y = x. 


Consider Figure 12.16, the upper left portion of the region shown in Figure 12.12. We 
will generate the entries in the matrix A that correspond to the nodes shown in this figure. 


Figure 12.16 
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For simplicity, we assume that F; is one of the nodes on S,, where the boundary 
condition u(x, y) = g(x, y) is imposed. The relationship between the nodes and the vertices 
of the triangles for this portion is 


= V0 = VO, By = VP, By = VSP = VM, and Ey = Vi. 


Since ¢; and ¢3 are both nonzero on T; and 7, the entries a),3 = a3,; are computed by 


0g) 0 0g, 0 
wa= [fp = 4 +0 - rov0s| dx dy 


0g) 0 0g) 0 
=) E 1 Gi x ey o 9¢3 rods] ee 
T\ 


Ox Ox dy dy 


dg, 0 0g, 0 
+ ff E oa 3 hy d, 003 ror0s| Seay 
T2 


ax Ox dy dy 
On triangle 7), 


dd) (1) 


gr (x,y) = NS? (x,y) = aS? + bY x + chy 


and 
s(x, y) = Ns? (x,y) = ah? + BS? x + cf, 


so for all (x, y), 


2) A ee, ang 2 
Ox dy Ox dy 


Similarly, on 7», 

i(xy) = NOG y) = ay? + bP x + cy 
and 

$3 (x,y) = Ny (x,y) = ay? + BY x + c§?y, 
so for all (x, y), 


Ayo, Mga A _ 50) ang 28 
dy ox 


a3 be bs" If pdx dy+ ey I q dx dy 
T, 


- ff a? + Ox + My) (a + Bx + cy) de dy 
T| 


De as ad If, pdx dy+ ee I q dx dy 
Ty 
= ff ola? + 8P abe?) (a? +0P x4 cP) ded 
Ty 


All the double integrals over D reduce to double integrals over triangles. The usual 
procedure is to compute all possible integrals over the triangles and accumulate them into 
the correct entry a in A. Similarly, the double integrals of the form 


Thus, 


If f (x, y)dbi(x, y) dx dy 
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are computed over triangles and then accumulated into the correct entry 6; of the vector b. 
For example, to determine £;, we need 


- i fx, y)o1@,y) dx dy = -| f(x, y)[as? + bi) x + c$)y] dx dy 
D T| 


- / f(y) lay’ +b x4 cy] dx dy. 
T2 


Because £ is a vertex of both T; and 7, part of 6; is contributed by ¢, restricted to T; 
and the remainder by ¢, restricted to Ty. In addition, nodes that lie on Sy have line integrals 
added to their entries in A and b. 

Algorithm 12.5 performs the Finite-Element method on a second-order elliptic differ- 
ential equation. The algorithm sets all values of the matrix A and vector b initially to 0 and, 
after all the integrations have been performed on all the triangles, adds these values to the 
appropriate entries in A and b. 


Finite-Element 


To approximate the solution to the partial differential equation 


C) ou 0 ou 
(pe. =) + (ae) + roy = flay), Gy) €D 
Ox Ox dy dy 
subject to the boundary conditions 


u(x,y) = gy), (&,y) € S| 


and 
Ou ou 
p(x, a5 ey) cos 6; + q(x, Ya, & y) cos 62 + 91 (x, y)u(x, y) = g2(%, y), 


(x,y) € Sp, 


where S; U S> is the boundary of D, and 6; and 6, are the direction angles of the normal to 
the boundary: 


Step 0 Divide the region D into triangles T,,..., Ty such that: 

T|,..., 7x are the triangles with no edges on S, or S3; 

(Note: K = 0 implies that no triangle is interior to D.) 
Tx+1,---, Ty are the triangles with at least one edge on Sp; 
Ty+1,---, 2m are the remaining triangles. 

(Note: M = N implies that all triangles have edges on S).) 
Label the three vertices of the triangle T; by 

Ge, ?) ey y?) oad ee y) 

10°01 2 »¥2 3 33 

Label the nodes (vertices) E),..., Em where 

E\,...,E, arein DU S) and E,41,..., Em are on §}. 

(Note: n = m implies that S, contains no nodes.) 


INPUT integers K,N,M,n, m; vertices (x), yf ) : (x, y) : (x, y) 
for each i = 1,...,M; nodes Ej for each j = 1,...,m. 
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(Note: All that is needed is a means of corresponding a vertex (a 9 ) to a node Ej = 
(aj, ¥)-) 
OUTPUT constants y),..., Yn; a bea for each j = 1,2,3 andi=1,...,M 


Step 1 Forl=n+1,...,msety = g(x,y). (Note: E; = (x, y7).) 


Step 2 Fori=1,...,n 

set 6; = 0; 

forj =1,...,nsetaj; = 0. 
Step 3 Fori=1,...,M 


1 x 99 
= @ @y]. 
set Aj = det|1 2,” y;’|; 
1 - a 
@@ @@ (i) (i) (i) (i) 
g® — 7223 72% . pO — 22 73 . Ps pec aia 
1 A; 2 1 A; 2 1 A; ig 
OO @ @ (i) (i) (i) @ 
@ _ 3 NY, ~Y3 41, pO 23 @_ 1 7% , 
ay ——  —— — = - > ? aa ———_ Cy = 
Aj Aj Aj 
OO OO (i) @ (i) @ 
@ _ *1 ¥2 7 V1 % , pot TY... @®_% 7%, 
a, = rr nn | = a) C3 — Ss oy es 
Aj A; A; 


for j = 1,2,3 
define Ni (x,y) = a? + bP x +c\y. 


Step 4 Fori=1,...,M (The integrals in Steps 4 and 5 can be evaluated using 
numerical integration.) 
for j = 1,2,3 
fork =1,...,j (Compute all double integrals over the triangles.) 


set = BOD? Jf Tip(x, y) dx dy + ee aes Tiq(x, y) dx dy 
— ff Try?  yNC (ay) de dy; 
set H!? = — ff T:f (@y)N,? (x,y) de dy. 
Step 5 Fori=K+1,...,N (Compute all line integrals.) 


for j = 1,2,3 
fork =1,...,j 
set J = 


= i g(x, YN, (x, y)NQ (x, y) dS; 


set i = i g(x, y)N,” (x,y) dS. 
So 


Step 6 Fori=1,...,M do Steps 7-12. (Assembling the integrals over each triangle 
into the linear system.) 


Step 7 Fork = 1,2,3 do Steps 8-12. 
Step 8 Find /so that E; = eu yf?) 


Step 9 Ifk > 1thenforj = 1,...,k — 1 do Steps 10, 11. 
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Step 10 Find t so that E,; = eg yf) : 
Step 11. Ifi<nthen 
if t < nthen set ay, = ay + a 
Ay = Ay + ae 
else set 8; = B; — nizQ) 
else 
if t <n then set B, = B; — Vizq)- 
Step 12 Ifl<nthenset ay = ay + a 
Bi = Bp + HY. 
Step 13. Fori=K+1,...,N do Steps 14-19. (Assembling the line integrals 
into the linear system.) 


Step 14 Fork = 1,2,3 do Steps 15-19. 
Step 15 Find 1 so that Ey = (x,”,9”). 
Step 16 Ifk > 1 then forj = 1,...,k — 1 do Steps 17, 18. 
Step 17 Find t so that E, = ( ‘ y}”) : 
Step 18 Ifl<nthen 
if t <n then set ay, = ay + i 
Ay = Ay + ci 
else set 8B; = 6; — vede 
else 
if t < n then set B, = 6; — ae 
Step 19 If] <nthen set ay = ay + nA 
B= Bi +1. 


Step 20 Solve the linear system Ac = b where A = (a), b = (6) and e = (y,) for 
1</l<nand1<t<n. 


Step 21 OUTPUT (1,..-, Ym). 
(For eachk = 1,...,mlet dy = nO on T; if Ey = (49?) 
Then (x,y) = > Vee (X,Y) approximates u(x, y) on DUS; US>.) 
Step 22 Fori=1,...,M 
forj =1,2,3 OUTPUT (a, e, a) 


Step 23. STOP. (The procedure is complete.) rT] 


Illustration The temperature, u(x, y), in a two-dimensional region D satisfies Laplace’s equation 


au au 
ay») + a2 oY) =0 onD. 
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Consider the region D shown in Figure 12.17 with boundary conditions given by 


u(x, y) =4, for (x, y) € Le and (x, y) € Ly; 
ou 
an >? =X, for (x, y) € Ty and (x, y) € L4; 
ou 
— (x,y) =y, for (x, y) € Ls; 
on 
a 

Ley for (x,y) € Ly and (x,y) € Ls, 


an J2’ 


where du/dn denotes the directional derivative in the direction of the normal n to the 
boundary of the region D at the point (x, y). 


Figure 12.17 


We first subdivide D into triangles with the labeling suggested in Step 0 of the algorithm. 
For this example, S; = Le U Ly and Sp = Ly UL, UL; UL4 U Ls. The labeling of triangles 
is shown in Figure 12.18. 


The boundary condition u(x, y) = 4.on L¢ and L7 implies that y, = 4 whent = 6,7,..., 11, 
that is, at the nodes E¢, E7,..., ,. To determine the values of y, for / = 1,2,...,5, apply 
the remaining steps of the algorithm and generate the matrix 


2.5 O -l 0 0 
0 15 -1 -05 0 
A=| -l -1 4 0 0 
0 —0.5 O 25 —0. 
0 0 0 —0.5 1 
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and the vector 


6.0666 

0.0633 

b=] 8.0000 

6.0566 

2.0316 

The solution to the equation Ac = b is 

Vi 4.0383 
0) 4.0782 
c=] vy |=] 4.0291 
V4 4.0496 
Vs 4.0565 


Solving this system gives the following approximation to the solution of Laplace’s equation 
and the boundary conditions on the respective triangles: 


T|: 
To: 
T3: 
T4: 
Ts: 
T6: 
Ty: 
Ts: 
To : 
Tio: 


(x,y) = 4.0383(1 — 5x + Sy) + 4.0291(—2 + 10x) + 4(2 — 5x — 5y), 
(x,y) = 4.0782(—2 + 5x + 5y) + 4.0291(4 — 10x) + 4(—1 + 5x — 5y), 
o(x,y) = 4(-1 + Sy) + 4(2 — 5x — Sy) + 4.0383(5x), 

$(x, y) = 4.0383(1 — 5x + Sy) + 4.0782(—2 + 5x + Sy) + 4.0291(2 — 10y), 
(x,y) = 4.0782(2 — Sx + 5y) + 4.0496(—4 + 10x) + 4(3 — 5x — 5y), 
(x,y) = 4.0496(6 — 10x) + 4.0565(—6 + 10x + 10y) + 4(1 — 10y), 

b (x,y) = 4(—5x + Sy) + 4.0383(5x) + 4(1 — 5y), 

P(x, y) = 4.0383(Sy) + 4(1 — 5x) + 4(5x — Sy), 

$ (x,y) = 4.0291(10y) + 4(2 — 5x — Sy) + 4(—1 + 5x — Sy), 

(x,y) = 4.0496(10y) + 4(3 — Sx — Sy) + 4(—2 + 5x — Sy). 
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The actual solution to the boundary-value problem is u(x, y) = xy +4. Table 12.7 compares 
the value of u to the value of ¢ at E;, for eachi = 1,...,5. 


Table 127 yoy) uy) sO) — Uy) 
0.2 0.2 4.0383 4.04 0.0017 
0.4 0.2 4.0782 4.08 0.0018 
0.3 0.1 4.0291 4.03 0.0009 
0.5 0.1 4.0496 4.05 0.0004 
0.6 0.1 4.0565 4.06 0.0035 


Typically, the error for elliptic second-order problems of the type (12.27) with smooth 
coefficient functions is O(h), where h is the maximum diameter of the triangular elements. 
Piecewise bilinear basis functions on rectangular elements are also expected to give O(h”) 
results, where h is the maximum diagonal length of the rectangular elements. Other classes 
of basis functions can be used to give O(h*) results, but the construction is more complex. 
Efficient error theorems for finite-element methods are difficult to state and apply because 
the accuracy of the approximation depends on the regularity of the boundary as well as on 
the continuity properties of the solution. 

The Finite-Element method can also be applied to parabolic and hyperbolic partial 
differential equations, but the minimization procedure is more difficult. A good survey on 
the advantages and techniques of the Finite-Element method applied to various physical 
problems can be found in a paper by [Fi]. For a more extensive discussion, refer to [SF], 
[ZM], or [AB]. 


EXERCISE SET 12.4 


1. Use Algorithm 12.5 to approximate the solution to the following partial differential equation (see the 
figure): 


a 


Ou a ou 
Aun ? (x, y) Se y— (x,y) — yu(x,y) = —x, (x,y) € D, 
Ox Ox oy dy 


u(x,0.5) = 2x, O<x<05, u(O,y)=0, O5<y< 1, 


a a 2 
yy) cos) ty“ (x, y) cos = v2 (y—x) for (x,y) € Sd. 
Ox dy 2 


y 


Let M = 2; T, have vertices (0, 0.5), (0.25, 0.75), (0, 1); and Ty have vertices (0, 0.5), (0.5, 0.5), and 
(0.25, 0.75). 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


12.4 An Introduction to the Finite-Element Method 759 


2. Repeat Exercise 1, using instead the triangles 


T;: (0,0.75), (0, 1), (0.25, 0.75); 

T): (0.25, 0.5), (0.25, 0.75), (0.5, 0.5); 
T3: (0,0.5), (0, 0.75), (0.25, 0.75); 
T,: (0,0.5), (0.25, 0.5), (0.25, 0.75). 


3. | Approximate the solution to the partial differential equation 


7 y 4 Be, ‘seta 252? si 5x. 5x 0 0.4 
— (x, — (x,y) — 12.57 »y) = —257~ sin —x sin —y, <x, y<04, 
Bye Oy aye OY u(x, y aa 52 x,y 


subject to the Dirichlet boundary condition 
u(x,y) = 0, 


using the Finite-Element Algorithm 12.5 with the elements given in the accompanying figure. Compare 
the approximate solution to the actual solution, 


ben _ on, Sa 
,y) = sin —x sin —y, 
Ux, Y 7* 7? 


at the interior vertices and at the points (0.125, 0.125), (0.125, 0.25), (0.25, 0.125), and (0.25, 0.25). 


0.4 


0.3 


0.2 


0.1 


0.1 0.2 0.3 0.4 


5 5 
4. Repeat Exercise 3 with f(x,y) = —257? cos sx cos =). using the Neumann boundary condition 


ou = 6 
—(x,y) =0. 
dn . 

The actual solution for this problem is 


wa) 5x 5x 

u(x, y) = cos —-x Cos —-y. 

5. A silver plate in the shape of a trapezoid (see the accompanying figure) has heat being uniformly 
generated at each point at the rate g = 1.5 cal/cm? -s. The steady-state temperature u(x, y) of the plate 
satisfies the Poisson equation 


07u au —q 
za OW + Guy =—, 

ah) + 55) = 

where k, the thermal conductivity, is 1.04 cal/cm-deg-s. Assume that the temperature is held at 15°C 
on Ly, that heat is lost on the slanted edges L; and L; according to the boundary condition du/dn = 4, 
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and that no heat is lost on L4; that is, du/dn = 0. Approximate the temperature of the plate at 
(1,0), (4,0), and (3. /3/2) by using Algorithm 12.5. 


| a 12.5 Survey of Methods and Software 


In this chapter, methods to approximate solutions to partial differential equations were con- 
sidered. We restricted our attention to Poisson’s equation as an example of an elliptic partial 
differential equation, the heat or diffusion equation as an example of a parabolic partial dif- 
ferential equation, and the wave equation as an example of a hyperbolic partial differential 
equation. Finite-difference approximations were discussed for these three examples. 

Poisson’s equation on a rectangle required the solution of a large sparse linear sys- 
tem, for which iterative techniques, such as the SOR method, are recommended. Four 
finite-difference methods were presented for the heat equation. The Forward-Difference 
and Richardson’s methods had stability problems, so the Backward-Difference method and 
the Crank-Nicolson methods were introduced. Although a tridiagonal linear system must be 
solved at each time step with these implicit methods, they are more stable than the explicit 
Forward-Difference and Richardson’s methods. The Finite-Difference method for the wave 
equation is explicit and can also have stability problems for certain choice of time and space 
discretizations. 

In the last section of the chapter, we presented an introduction to the Finite-Element 
method for a self-adjoint elliptic partial differential equation on a polygonal domain. Al- 
though our methods will work adequately for the problems and examples in the textbook, 
more powerful generalizations and modifications of these techniques are required for com- 
mercial applications. 

One of the subroutines from the IMSL Library is used to solve the partial differential 
equation 


with boundary conditions 


a(x, t)u(x, t) + B(x, ates t) = y@Q,¢). 


The routine is based on collocation at Gaussian points on the x-axis for each value of ¢ and 
uses cubic Hermite splines as basis functions. Another subroutine from IMSL is used to 
solve Poisson’s equation on a rectangle. The method of solution is based on a choice of 
second- or fourth-order finite differences on a uniform mesh. 

The NAG Library has a number of subroutines for partial differential equations. One 
subroutine is used for Laplace’s equation on an arbitrary domain in the xy-plane, and another 
is used to solve a single parabolic partial differential equation by the method of lines. 
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There are specialized packages, such as NASTRAN, consisting of codes for the Finite- 
Element method. These packages are popular in engineering applications. The package 
FISHPACK in the netlib library is used to solve separable elliptic partial differential equa- 
tions. General codes for partial differential equations are difficult to write because of the 
problem of specifying domains other than common geometrical figures. Research in the 
area of solution of partial differential equations is currently very active. 

We have only presented a small sample of the many techniques used for approximating 
the solutions to the problems involving partial differential equations. Further information 
on the general topic can be found in Lapidus and Pinder [LP], Twizell [Tw], and the recent 
book by Morton and Mayers [MM]. Software information can be found in Rice and Boisvert 
([RB] and in Bank [Ban]. 

Books that focus on finite-difference methods include Strikwerda [Stri], Thomas [Th], 
and Shashkov and Steinberg [ShS]. Strange and Fix [SF] and Zienkiewicz and Morgan [ZM] 
are good sources for information on the finite-element method. Time-dependent equations 
are treated in Schiesser [Schi] and in Gustafsson, Kreiss, and Oliger [GKO]. Birkhoff and 
Lynch [BL] and Roache [Ro] discuss the solution to elliptic problems. 

Multigrid methods use coarse grid approximations and iterative techniques to pro- 
vide approximations on finer grids. References on these techniques include Briggs [Brigg], 
Mc Cormick [Mc], and Bramble [Bram]. 
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Answers for Selected Exercises 


Exercise Set 1.1 (Page 14) 


1. For each part, f € C[a,b] on the given interval. Since f(a) and f(b) are of opposite sign, the Intermediate Value Theorem 
implies that a number c exists with f(c) = 0. 

3. For each part, f € C[a,b], f’ exists on (a,b) and f(a) = f(b) = 0. Rolle’s Theorem implies that a number c exists in (a, b) 
with f’(c) = 0. For part (d), we can use [a, b] = [—1,0] or [a, b] = [0, 2]. 

5. For x < 0, f(x) < 2x +k <0, provided that x < —$k. Similarly, for x > 0, f(x) > 2x +k > 0, provided that x > —$k. 
By Theorem 1.11, there exists a number c with f(c) = 0. If f(c) = 0 and f(c’) = 0 for some c’ #c, then by Theorem 1.7, 
there exists a number p between c and c’ with f’(p) = 0. However, f’(x) = 3x7 +2 > 0 for all x. 


7. a. Ps(x) =0 b. R(0.5) = 0.125; actual error = 0.125 
c. Po(x) = 14+ 3(¢— 1) +3(%— 1)? d. R,(0.5) = —0.125; actual error = —0.125 
9. Since 


—2e* (sin E + cos €) 7 
x 


P(x) =14+x and Ry(x) = : 


for some & between x and 0, we have the following: 
a. P3(0.5) = 1.5 and | f (0.5) — P2(0.5)| < 0.0532; b. | f(x) — Po(x)| < 1.252; 
c.f, f(x) dx © 15; 
d. | fy f(@) dx — fy Po(x) dx| < J) |Ro@)| dx < 0.313, and the actual error is 0.122. 
11. P3(x) = (@- 1? — 50-13 
a. P3(0.5) = 0.312500, (0.5) = 0.346574. An error bound is 0.2916, and the actual error is 0.034074. 
b. | f(x) — P3(x)| < 0.2916 on [0.5, 1.5] 
c. fos P3(x) dx = 0.083, fi2 (x — 1) Inx dx = 0.088020 
d. An error bound is 0.0583, and the actual error is 4.687 x 107. 
13. Py(x) =x4+x° 
a. | f(x) — Pa(x)| < 0.012405 b. [o" Py(x) dx = 0.0864, f" xe dx = 0.086755 
c. 8.27 x 107+ 
d. P4(0.2) = 1.12, f’(0.2) = 1.124076. The actual error is 4.076 x 1073. 
15. Since 42° = 77/30 radians, use x9 = 2/4. Then 


ee 2 Ge Ea _ (0.053)"*" 
"\30/;/— (n+ D! (n+1)! ° 


For IRn()| < 10~°, it suffices to take n = 3. To 7 digits, cos 42° = 0.7431448 and P3(42°) = P3(4) = 0.7431446, so the 
actual error is 2 x 1077. 


17. a. P3(x) = In(3) + $@— 1) + 30-1? - B@- 1 b. maxg<r<i | f(x) — P3(x)| = | f (0) — P3(0)| = 0.02663366 
c. P3(x) = In(2) + 3x? d. maxo<x<1 | f(x) — P3(x)| = | f 1) — P3(1)| = 0.09453489 
e. P3(0) approximates f(0) better than P3(1) approximates f(1). 
n 1 i 
19. Prix) = oa, n>7 
k=0 
773 
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21. A bound for the maximum error is 0.0026. 


23. a. The assumption is that f(x;) = 0 for each i = 0,1,...,n. Applying Rolle’s Theorem on each on the intervals [x;, x41] 
implies that for each i = 0,1,...,2— 1 there exists a number z; with f’(z;) = 0. In addition, we have 
A<X <2 <X <2 <-'+< Mp <X%y <b. 


b. Apply the logic in part (a) to the function g(x) = f’(x) with the number of zeros of g in [a,b] reduced by 1. This implies 
that numbers w;, for i = 0,1,...,n — 2 exist with g’(w;) = f”(w;) = 0, and 
Zo < Wo < 21 < Wi <0 t+ < Zn-2 < Wn-2 < Zn-1- 
c. Continuing by induction following the logic in parts (a) and (b) provides n — j + 1 distinct zeros of f\ in [a, b]. 
d. The conclusion of the theorem follows from part (c) when j = n, for in this case there will be (at least) n — (n-—1) = 1 
zero of f™ in [a,b]. 
25. Since R>(1) = te, for some & in (0, 1), we have |E — Ro(1)| = 7/1 —e| < d(e— 1). 


27. a. Let x9 be any number in [a,b]. Given ¢ > 0, let 6 = e/L. If |x — xo| < 6 and a < x < b, then 
| f(x) — f%o)| < Llx — x0] < €. 
b. Using the Mean Value Theorem, we have 


If G2) — fOvl = If’ @be2 — xl, 
for some & between x; and x2, so 
| f G2) — f@)| < Ela — x1]. 


c. One example is f(x) = x!/3 on [0, 1]. 
29. a. Since f is continuous at p and f(p) # 0, there exists a 5 > 0 with 


LF) — Fem) < PE 


for |x — p| < 6 and a <x < b. We restrict 6 so that [p — 6,p + 4] is a subset of [a,b]. Thus, for x € [p — 6,p + 6], we 
have x € [a,b]. So 


ae 2xe#ipe LD 


and 


f(p)- ph < f(x) < f(p)+ eZ 


If f(p) > 0, then 


fp) — EP = PP oo, so fo > f(y -_ 50. 
If f(p) <0, then | f(p)| = —f(p), and 
jects 2 a 2 A, 


In either case, f(x) 4 0, for x € [p—45,p +4]. 


b. Since f is continuous at p and f(p) = 0, there exists a 6 > O with 
\f@)-f(@P|<k, for |x-—pl|<6 and a<x<b. 
We restrict 6 so that [p — 6,p + 6] is a subset of [a,b]. Thus, for x € [p — 6,p + 6], we have 


IFO = lf) — fp) < k. 
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Exercise Set 1.2 (Page 28) 


1. Absolute Error Relative Error 
a. 0.001264 4.025 x 10-4 
b. 7.346 x 107° 2.338 x 107° 
c. 2.818 x 10-4 1.037 x 10-4 
d. 2.136 x 107+ 1.510 x 10-4 
e. 2.647 x 10! 1.202 x 1073 
f. 1.454 x 10! 1.050 x 10-7 
g. 420 1.042 x 10-2 
h. 3.343 x 10° 9.213 x 1073 

3. The largest intervals are 


a. (149.85, 150.15) 


b. (899.1, 900.9) 
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c. (1498.5, 1501.5) d. (89.91, 90.09) 


5. Approximation Absolute Error Relative Error 
a. 134 0.079 5.90 x 10-4 
b. 133 0.499 3.77 x 10-3 
c. 2.00 0.327 0.195 
d. 1.67 0.003 1.79 x 107-3 
e. 1.80 0.154 0.0786 
f. —15.1 0.0546 3.60 x 10-3 
g. 0.286 2.86 x 10-4 10° 
h. 0.00 0.0215 1.00 
7. Approximation Absolute Error Relative Error 
a. 133 0.921 6.88 x 10-3 
b. 132 0.501 3.78 x 107-3 
c. 1.00 0.673 0.402 
d. 1.67 0.003 1.79 x 10-3 
e. 3.55 1.60 0.817 
f. —15.2 0.0454 0.00299 
g. 0.284 0.00171 0.00600 
h. 0 0.02150 1 
9. Approximation Absolute Error Relative Error 
a. 3.14557613 3.983 x 10-3 1.268 x 10-3 
b. 3.14162103 2.838 x 107° 9.032 x 10~6 
xcosx — sinx . —xsinx _ —sinx —xcosx —2cosx +xsinx 
11. a. —_———. = lim = lim - = =-—2 
x — sinx x0 1—cosx x0 sin x x0 cos x 
b. —1.941 


x(1 — $x”) — (w— 22°) _ 


i 
x— (x — 5x3) 


d. The relative error in part (b) is 0.029. The relative error in part (c) is 0.00050. 
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13. x} Absolute Error Relative Error X2 Absolute Error Relative Error 
a. 92.26 0.01542 1.672 x 107+ 0.005419 6.273 x 1077 1.157 x 107+ 
b. 0.005421 1.264 x 10-6 2.333 x 107+ —92.26 4.580 x 1073 4.965 x 10-5 
c. 10.98 6.875 x 1073 6.257 x 107+ 0.001149 7.566 x 1078 6.584 x 107° 
d. —0.001149 7.566 x 1078 6.584 x 107> —10.98 6.875 x 1073 6.257 x 107+ 


15. The machine numbers are equivalent to 


a. 3224 b. —3224 ec. 1.32421875 
d. 1.32421875000000022204460492503 1308084726333618 1640625 


17. b. The first formula gives —0.00658, and the second formula gives —0.0100. The true three-digit value is —0.0116. 


19. The approximate solutions to the systems are 


a. x = 2.451, y = —1.635 b. x = 507.7, y = 82.00 
21. a. In nested form, we have f(x) = (((1.0le* — 4.62)e* — 3.11)e* + 12.2)e* — 1.99. 
b. —6.79 
ce. —7.07 
23. a. n=77 b. n = 35 
25. a. m= 17 
b. ae m! _ mim —1)++-(m—k = 1m — by)! 
k k\(m — k)! k\(m — k)! 
OE) 
k k-1 1 
c. m= 181707 
d. 2,597,000; actual error 1960; relative error 7.541 x 107+ 
27. a. 124.03 b. 124.03 ce. —124.03 d. —124.03 
e. 0.0065 f. 0.0065 g. —0.0065 h. —0.0065 


Exercise Set 1.3 (Page 39) 
1. a. The approximate sums are 1.53 and 1.54, respectively. The actual value is 1.549. Significant roundoff error occurs earlier 
with the first method. 
3. a. 2000 terms b. 20,000,000,000 terms 
5. 3 terms 
7. The rates of convergence are: 
a. O(n’) b. O(h) c. O(n’) d. O(h) 


13. a. If ja, —a|/(1/n’) < K, then |a, —a| < K(1/n’) < K(1/n’) since 0 <q <p. Thus, |a, — a|/(1/n”) < K and 
{a,}°°, — a with rate of convergence O(1/n”). 


bo on 1/n 1/n? 1/n3 1/n* 
) 0.2 0.04 0.008 0.0016 
10 0.1 0.01 0.001 0.0001 
50 0.02 0.0004 8 x 10-6 1.6 x 10-7 
100 0.01 10-4 10-¢ 10-8 


O(1/n*) is the most rapid convergence rate. 
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15. Suppose that for sufficiently small |x| we have positive constants k, and k, independent of x, for which 
[Fi(x) — Lil < Kilx|® and |Fo(x) — Ly] < Kole’. 


Let c = max(|c;|, |c2|, 1), K = max(Kj, K2), and 6 = max(qa, B). 
a. We have 
|F@) — eLy — coLo| = |e Fi @) — Li) + e2(F2(x) — L2)| 
< lei Ki|x|* + le2|Kolx|? 
< cK [lx|* + [x1"] 
< cK |x|’U1 + [xP] 
< K\x\’, 
for sufficiently small |x| and some constant RK. Thus, F(x) = ci Ly + coLl2 + O(X”). 
b. We have 
|G(x) — Ly — Ly| = |Fi(c1x) + F2(cox) — Li — Ly| 

< Ky|e.x|* + Ky|c2x|? 
< Ke'[|x|" + kxl?] 
< Ke? |x|" + [x7] 
< K\xl", 


for sufficiently small |x| and some constant K. Thus, G(x) = L; + Ly + O(2”). 
17. a. 354224848179261915075 b. 0.3542248538 x 107! 


c. The result in part (a) is computed using exact integer arithmetic, and the result in part (b) is computed using 10-digit 


rounding arithmetic. 
d. The result in part (a) required traversing a loop 98 times. 


e. The result is the same as the result in part (a). 


Exercise Set 2.1 (Page 54) 


1. p3 = 0.625 
3. The Bisection method gives: 
a. p7 = 0.5859 b. ps = 3.002 c. p7 = 3.419 
5. The Bisection method gives: 
a. Pi7 = 0.641182 b. piz = 0.257530 
c. For the interval [—3, —2], we have py7 = —2.191307, and for the interval [—1,0], we have p,;7 = —0.798164. 
d. For the interval [0.2,0.3], we have p,4 = 0.297528, and for the interval [1.2, 1.3], we have p,4 = 1.256622. 
a Ay 


b. Using [1.5,2] from part (a) gives pig = 1.89550018. 
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11. 
13. 
15. 
17. 


19. 


b. pi7 = 1.00762177 

a. 2 b. —2 ce —1 d. 1 
The third root of 25 is approximately p;4 = 2.92401, using [2, 3]. 

A bound is n > 14, and pig = 1.32477. 


Since limy+o0(Pn — Pn—1) = lim. 1/n = 0, the difference in the terms goes to zero. However, p, is the nth term of the 
divergent harmonic series, so lim). Pn = ©. 


The depth of the water is 0.838 ft. 


Exercise Set 2.2 (Page 64) 


1. 


For the value of x under consideration we have 
a. x= (34+x-2x7)'4 6x4 =34x-2x? & fix) =0 


3 x4\'12 
b= (4>*) 62? =x4+3-x' 6 fi =0 


3\12 
Cc. r= (35) & 70x? +2) =x4+36 f(x) =0 


© 4x44 40? — x = 3x47 4.207436 fix) =0 


. The order in descending speed of convergence is (b), (d), (a). The sequence in (c) does not converge. 
- With gx) = (3x7 + 3)'/4 and Po = 1, po = 1.94332 is accurate to within 0.01. 


1 


. Since g’(x) = > cos %, g is continuous and g’ exists on [0, 27]. Further, g’(x) = 0 only when x = z, so that 


4 2 
g(O) = gQr) =a < g(x) =< g(t) = T+ 5 and |g’(x)| < i, for 0 < x < 27. Theorem 2.3 implies that a unique fixed point 


p exists in [0,27]. With k = j and pp = 7, we have py = 7 + s. Corollary 2.5 implies that 
| |< E | |= 2(4y. 
Pn P= l—k Pi Pol = 3\4 . 

For the bound to be less than 0.1, we need n > 4. However, p3 = 3.626996 is accurate to within 0.01. 


9. For po = 1.0 and g(x) = 0.5(4+ 2), we have /3 = Pa = 1.73205. 
11. a. With [0,1] and po = 0, we have po = 0.257531. b. With [2.5, 3.0] and po = 2.5, we have pi7 = 2.690650. 
ec. With [0.25, 1] and po = 0.25, we have py4 = 0.909999. d. With [0.3,0.7] and po = 0.3, we have p39 = 0.469625. 
e. With [0.3, 0.6] and po = 0.3, we have pag = 0.448059. f. With [0,1] and po = 0, we have po = 0.704812. 
13. For g(x) = (2x? — 10cosx)/(3x), we have the following: 


15. 


17. 
21. 


Po =3 => ps = 3.16193; po = —3 > pg = —3.16193. 
For g(x) = arccos(—0.1x7), we have the following: 
Po= 1> pu = 1.96882; Po=-l => Pi = — 1.96882. 
With g(x) = 4 arcsin (-3) + 2, we have ps = 1.683855. 
One of many examples is g(x) = /2x — 1 on [4 i}; 
Replace the second sentence in the proof with: “Since g satisfies a Lipschitz condition on [a,b] with a Lipschitz constant 
L < 1, we have, for each n, 


| Pn —p| = |g(Pn-1) — g(p)| Ss L| Pn-1 —p\.” 
The rest of the proof is the same, with k replaced by L. 
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23. With g(t) = 501.0625 — 201.0625e—°’ and pp = 5.0, p3 = 6.0028 is within 0.01 s of the actual time. 


Exercise Set 2.3 (Page 75) 


1. po = 2.60714 


3. a. 2.45454 b. 2.44444 c. Part (b) is better. 
5. a. For po = 2, we have ps = 2.69065. b. For pp = —3, we have p3 = —2.87939. 
c. For po = 0, we have ps = 0.73909. d. For po = 0, we have p3 = 0.96434. 
7. Using the endpoints of the intervals as po and p;, we have: 
a. Pi; = 2.69065 b. p7 = —2.87939 c. Po = 0.73909 d. ps = 0.96433 
9. Using the endpoints of the intervals as po and p;, we have: 
a. Pig = 2.69060 b. po = —2.87938 c. p7 = 0.73908 d. po = 0.96433 


11. a. Newton’s method with po = 1.5 gives p3 = 1.51213455. 
The Secant method with po = 1 and p; = 2 gives pio = 1.51213455. 
The Method of False Position with pp = 1 and p; = 2 gives py7 = 1.51212954. 


b. Newton’s method with po = 0.5 gives ps = 0.976773017. 
The Secant method with pp = 0 and p; = 1 gives ps = 10.976773017. 
The Method of False Position with p) = 0 and p; = | gives ps = 0.976772976. 


13. For po = 1, we have ps = 0.589755. The point has the coordinates (0.589755, 0.347811). 


15. The equation of the tangent line is 


= S (Pn-1) = f'n — Pn-1). 


To complete this problem, set y = 0 and solve for x = py. 

17. a. For po = —1 and p; = 0, we have pi7 = —0.04065850, and for po = 0 and p; = 1, we have po = 0.9623984. 
b. For po = —1 and p; = 0, we have ps = —0.04065929, and for po = 0 and p; = 1, we have pi. = —0.04065929. 
c. For po = —0.5, we have ps = —0.04065929, and for po = 0.5, we have pr; = 0.9623989. 


19. This formula involves the subtraction of nearly equal numbers in both the numerator and denominator if p,—-; and pn-2 are 
nearly equal. 


21. a. po = —10, py, = —4.30624527 b. po = —5,ps = —4.30624527 
Cc. po = —3, ps = 0.824498585 d. po = —1,p4 = —0.824498585 
e. Po = 0, and you cannot compute p;, since f’(0) = 0 f. po = 1, p4 = 0.824498585 
g. Po = 3, ps = —0.824498585 h. po = 5, ps = 4.30624527 
i. po = 10, pi) = 4.30624527 


23. For f(x) = In(x? + 1) — e°** cos zx, we have the following roots. 
a. For po = —0.5, we have p; = —0.4341431. 
b. For po = 0.5, we have p3 = 0.4506567. 
For po = 1.5, we have p3 = 1.7447381. 
For po = 2.5, we have ps = 2.2383198. 
For po = 3.5, we have p4 = 3.7090412. 
c. The initial approximation n — 0.5 is quite reasonable. 
d. For po = 24.5, we have po = 24.4998870. 
25. The two numbers are approximately 6.512849 and 13.487151. 
27. The borrower can afford to pay at most 8.10%. 
29. a. solve(3O*) — 7.5, x) and fsolve3t) — 7-5°,x) both fail. 
b. plot3O+ —7.5°,x% =a... b) generally yields no useful information. However, with a = 10.5 and b = 11.5 in the plot 
command shows that f(x) has a root near x = 11. 
ce. With po = 11, ps = 11.0094386442681716 is accurate to 10-!6. 
In(3/7) 
P* 125/27) 
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31. We have P; = 265816, c = —0.75658125, and k = 0.045017502. The 1980 population is P(30) = 222,248,320, and the 
2010 population is P(60) = 252,967,030. 


33. Using po = 0.5 and p; = 0.9, the Secant method gives ps = 0.842. 


Exercise Set 2.4 (Page 85) 


1. a. For po = 0.5, we have pi3 = 0.567135. b. For po = —1.5, we have po3 = —1.414325. 
c. For po = 0.5, we have px = 0.641166. d. For po = —0.5, we have po3 = —0.183274. 
3. Modified Newton’s method in Equation (2.11) gives the following: 
a. For po = 0.5, we have p3 = 0.567143. b. For po = —1.5, we have p2 = —1.414158. 
c. For po = 0.5, we have p3 = 0.641274. d. For po = —0.5, we have ps = —0.183319. 


5. Newton’s method with pp = —0.5 gives pj3 = —0.169607. Modified Newton’s method in Eq. (2.11) with po = —0.5 gives 
Pi = —9.169607. 


7. a. For k > 0, 


so the convergence is linear. 
b. We need to have N > 10/, 


9. Typical examples are 


a. p, = 107" b. py, = 107%" 
_, lgat| 1 
11. This follows from the fact that lim Ea =e 
n->0o 
gn 
13. If 21-2! — 0.75 and | py — p| = 0.5, then 


[pnp 
| Pn — p| = (0.75)°"-”?| po — pl. 
To have | p, — p| < 10~® requires that n > 3. 


Exercise Set 2.5 (Page 90) 


1. The results are listed in the following table. 


a. b. c. d. 
Po 0.258684 0.907859 0.548101 0.731385 
Pi 0.257613 0.909568 0.547915 0.736087 
Po 0.257536 0.909917 0.547847 0.737653 
B3 0.257531 0.909989 0.547823 0.738469 
Pa 0.257530 0.910004 0.547814 0.738798 
Ds 0.257530 0.910007 0.547810 0.738958 


. p\? = 0.826427 
pe =1,:5 
. For g(x) =,/1+ t and po = 1, we have p3 = 1.32472. 
. For g(x) = 0.5@+ 3) and po = 0.5, we have p4 = 1.73205. 
11. a. For g(x) = (2 e+ x) /3 and po = 0, we have p3 = 0.257530. 
b. For g(x) = 0.5(sinx + cosx) and po = 0, we have p4 = 0.704812. 
c. With po = 0.25, ps = 0.910007572. 
d. With po = 0.3, ps = 0.469621923. 
13. Aitken’s A? method gives: 
a. Pio = 0.045 b. py = 0.0363 


Xe eo) 
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15. We have 
| Pati — Pn _ | Putt — P+P— Pri _ |Pa+i—P 1 
| Pn — Pl | Pn — Pl Pn—P , 
so 
lim | Pn4i — Pl = Pn+i i =] 
Be OO IPn P| Bare Pn 
17. a. Hint: First show that p, — p = — 7a e*x"*', where & is between 0 and 1. 
bon Pn Pn 
0 1 3 
1 2 2.75 
2 25 2.72 
3 2.6 2.71875 
4 2.7083 2.7183 
5 2.716 2.7182870 
6 2.71805 2.7182823 
7 2.7182539 2.7182818 
8 2.7182787 2.7182818 
9 2.7182815 
10 2.7182818 


Exercise Set 2.6 (Page 100) 


1. For po = 1, we have po = 2.69065. 


. For po = 1, we have ps = 0.53209; for pp = —1, we have p3 = —0.65270; and for po = —3, we have p3 = —2.87939. 


. For po = 1, we have ps = 1.32472. 


781 


. For po = 0, we have ps = —0.47006; for po = —1, we have ps = —0.88533; and for po = —3, we have ps = —2.64561. 


For po = 0, we have pio = 1.49819. 
3. The following table lists the initial approximation and the roots. 


a. 
b. 
c 
d. For po = 1, we have ps = 1.12412; and for po = 0, we have pg = —0.87605. 
e 
f. 


Po Pi Po Approximate roots Complex conjugate roots 
a. —l 0 1 P7 = —9.34532 — 1.31873: —0.34532 + 1.318737 
0 1 2 Po = 2.69065 
b. 0 1 2 Po = 0.53209 
1 2 3 Po = —0.65270 
—2 —3 = 2D P4 = —2.87939 
c. 0 1 2 Ps = 1.32472 
=2 -1 0 P7 = —0.66236 — 0.562287 —0.66236 + 0.56228i 
d. 0 1 2 ps = 1.12412 
2 3 4 Pi2 = —0.12403 + 1.74096 —0.12403 — 1.74096 
—2 0 -1 Ps = —0.87605 
e. 0 1 2 Pio = —0.88533 
0 —0.5 Ps = —0.47006 
-1 —2 —3 Ds = —2.64561 
f. 0 1 2 Po = 1.49819 
-1 =—2 =3 Pio = —0.51363 — 1.09156: —0.51363 + 1.09156i 
1 0 -1 Ps = 0.26454 — 1.32837: 0.26454 + 1.32837: 
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5. a. The roots are 1.244, 8.847, and —1.091, and the critical points are 0 and 6. 
b. The roots are 0.5798, 1.521, 2.332, and —2.432, and the critical points are 1, 2.001, and —1.5. 
(54 + 6./129)?/3 — 12 


3(54 + 6/129)!/3 
9. The methods all find the solution 0.23235. 


11. The minimal material is approximately 573.64895 cm”. 


7. The only real zero is 


Exercise Set 3.1 (Page 114) 


1. a. P(x) = —0.148878x + 1; Po(x) = —0.452592x? — 0.0131009x + 1; P; (0.45) = 0.933005; 
| f (0.45) — P; (0.45)| = 0.032558; P2(0.45) = 0.902455; | f (0.45) — P2(0.45)| = 0.002008 


b. Pi (x) = 0.467251x + 1; P2(x) = —0.0780026x? + 0.490652x + 1; P1(0.45) = 1.210263; 
| f (0.45) — P;(0.45)| = 0.006104; P2(0.45) = 1.204998; | f (0.45) — P.(0.45)| = 0.000839 

c. Pi (x) = 0.874548x; P2(x) = —0.268961x? + 0.955236x; P; (0.45) = 0.393546; | f (0.45) — P,(0.45)| = 0.0212983; 
P(0.45) = 0.375392; | f (0.45) — P2(0.45)| = 0.003828 

d. Pi (x) = 1.031121x; P2(x) = 0.615092x? + 0.846593x; P(0.45) = 0.464004; | f (0.45) — P,(0.45)| = 0.019051; 
P2(0.45) = 0.505523; | f (0.45) — P2(0.45)| = 0.022468 


3. a. | © 0.45 — 0)(0.45 — 0.6)| < 0.135; |F2o45 0)(0.45 0.6)(0.45 — 0.9)| < 0.00397 


b. | “© (0.45 — 0)(0.45 — 0.6)| < 0.03375; 


£8) (0.45 — 0)(0.45 — 0.6)(0.45 — 0.9) < 0.001898 


ce. |£© (0.45 — 0)(0.45 — 0.6)| < 0.135; |F2 0.45 0)(0.45 0.6)(0.45 — 0.9)| < 0.010125 


d. | “© (0.45 — 0)(0.45 — 0.6)| < 0.06779; 


£°'8) (0.45 — 0) (0.45 — 0.6) (0.45 — 0.9) < 0.151 


5. an XO, X15 06-5 Xp P,, (8.4) b. n Xs Nis sek Xa. P,,(—1/3) 
1 8.3, 8.6 17.87833 1 —0.5, —0.25 0.21504167 
2 8.3, 8.6, 8.7 17.87716 2 —0.5, —0.25, 0.0 0.16988889 
3 8.3, 8.6, 8.7, 8.1 17.87714 3 —0.5, —0.25, 0.0, —0.75 0.17451852 
Gn Ros Mises 3 Xn P,,(0.25) d.n Noe XiasXa P,,(0.9) 
1 0.2, 0.3 —0.13869287 1 0.8, 1.0 0.44086280 
2 0.2, 0.3, 0.4 —0.13259734 2 0.8, 1.0, 0.7 0.43841352 
3 0.2, 0.3, 0.4, 0.1 —0.13277477 3 0.8, 1.0, 0.7, 0.6 0.44198500 
7oan Actual Error Error Bound b. n Actual Error Error Bound 
1 1.180 x 1073 1.200 x 1073 1 4.052 x 1077 4.515 x 1077 
2 1.367 x 10-5 1.452 x 1075 2 4.630 x 1073 4.630 x 10-3 
Gn Actual Error Error Bound d. n Actual Error Error Bound 
1 5.921 x 1073 6.097 x 1073 1 2.730 x 1073 1.408 x 1077 
2 1.746 x 10-4 1.813 x 10-4 2 5.179 x 1073 9.222 x 1073 
9. y= 1.25 


11. We have f (1.09) © 0.2826. The actual error is 4.3 x 107°, and an error bound is 7.4 x 107°. The discrepancy is due to the 
fact that the data are given to only four decimal places, and only four-digit arithmetic is used. 


13. a. Po(x) = —11.22388889x? + 3.810500000x + 1 , and an error bound is 0.11371294. 
b. P(x) = —0.1306344167x? + 0.8969979335x — 0.63249693, and an error bound is 9.45762 x 107+. 
c. P3(x) = 0.1970056667x? — 1.06259055x? + 2.532453189x — 1.666868305, and an error bound is 10~+. 
d. P3(x) = —0.07932x? — 0.545506x* + 1.0065992x + 1, and an error bound is 1.591376 x 107°. 
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15. 


17. 
19. 


21. 


23. 
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Using 10 digits gives 
P3(x) = 1.302637066x* — 3.511333118x? + 4.071141936x — 1.670043560, P3(1.09) = 0.282639050, 


and | f (1.09) — P3(1.09)| = 3.8646 x 10-°. 
The largest possible step size is 0.004291932, so 0.004 would be a reasonable choice. 
a. Sample 1: P(x) = 6.67 — 42.6434x + 16.1427x? — 2.09464x7 + 0.126902x* — 0.00367168x° + 0.0000409458x°; 
Sample 2: Ps(x) = 6.67 — 5.67821x + 2.91281x? — 0.413799x3 + 0.0258413x* — 0.000752546x> + 0.00000836160x° 
b. Sample 1: 42.71 mg; Sample 2: 19.42 mg 
Since g(x) = g(x) = 0, there exists a number &, between x and x9, for which g’(&,) = 0. Also, 9’(xo) = 0, so there exists a 
number & between xo and &,, for which g’(&) = 0. The process is continued by induction to show that a number &,41 
between xo and &, exists with gt) (g,,,;) = 0. The error formula for Taylor polynomials follows. 


a () Bw~)=x (i) Bs) =1 b. n > 250,000 


Exercise Set 3.2 (Page 123) 


1. 
3. 


11. 


13. 


The approximations are the same as in Exercise 5 of Section 3.1. 
a. We have /3 © P4(1/2) = 1.7083. b. We have /3 © P,(3) = 1.690607. 


c. Absolute error in part (a) is approximately 0.0237, and the absolute error in part (b) is 0.0414, so part (a) is more 
accurate. 


© Poi23(2.5) = 2.875 
. The incorrect approximation is — f(2)/6+ 2 f(1)/3 + 4/3 + 2f(—1)/3 — f(—2)/6. and the correct approximation is 


—f(2)/6+2f()/3+2f(—1)/3 — f(—2)/6, so the incorrect approximation is 4/3 too large. 
The first ten terms of the sequence are 0.038462, 0.333671, 0.116605, —0.371760, —0.0548919, 0.605935, 0.190249, 
—0.513353, —0.0668173, and 0.448335. Since f(1 + V10) = 0.0545716, the sequence does not appear to converge. 


Change Algorithm 3.1 as follows: 

INPUT numbers yo, y1,...,¥n3 Values X0,X1,...,X, as the first column Qo, Qi0,---,Qno of Q. 
OUTPUT the table Q with Q,,, approximating f~'(0). 

Step 71 For i=1,2,...,n 


for j = 1,2,...,i 


;O;-1,;-1 — yi-Qi,j-1 
set Oi = yiQi rs Seis ‘ 
i Ji-j 


Exercise Set 3.3 (Page 133) 


1. 


3. 


5. 


a. P(x) = 16.9441 + 3.1041 (x — 8.1); Pi (8.4) = 17.87533 Po(x) = P(x) + 0.06(x — 8.1)(x — 8.3); P2(8.4) = 17.87713 
P3(x) = Po(x) + —0.00208333(x — 8.1)(x — 8.3) (x — 8.6); P3(8.4) = 17.87714 

b. P;(x) = —0.1769446 + 1.9069687(x — 0.6); P; (0.9) = 0.395146 
Po(x) = Py (x) + 0.959224(x — 0.6)(x — 0.7); P2(0.9) = 0.4526995 
P3(x) = Po(x) — 1.785741 (x — 0.6)(x — 0.7)(x — 0.8); P3(0.9) = 0.4419850 


1 
In the following equations, we have s = i (x — x0). 


a. P;(s) = —0.718125 — 0.0470625s; P; (—+) = —0.006625 
P (s) = P,(s) + 0.312625s(s — 1)/2; P; (—4) = 0.1803056 
P3(s) = P3(s) + 0.09375s(s — 1)(s — 2)/6; P3 (—+) = 0.1745185 
b. Pi (s) = —0.62049958 + 0.3365129s; P, (0.25) = —0.1157302 
P(s) = Pi(s) — 0.04592527s(s — 1)/2; P2(0.25) = —0.1329522 
P3(s) = P3(s) — 0.00283891s(s — 1)(s — 2)/6; P3(0.25) = —0.1327748 


1 
In the following equations, we have s = Pha — Xn). 


a. P,(s) = 1.101 + 0.7660625s;  f(—4) © Pi (—$) = 0.07958333 
P(s) = P,(s) + 0.406375s(s + 1)/2;  f(—4) © P2(—¥) = 0.1698889 
P3(s) = P2(s) + 0.09375s(s + 1)(s + 2)/6;  f (—4) © P3(—3) = 0.1745185 
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b. P;(s) = 0.2484244 + 0.2418235s; _f (0.25) © P,(—1.5) = —0.1143108 
Px(s) = P,(s) — 0.04876419s(s + 1)/2;  f (0.25) © P3(—1.5) = —0.1325973 
P3(s) = P3(s) — 0.00283891s(s + 1)(s+2)/6; _f (0.25) © P3(—1.5) = —0.1327748 


7. a. P3(x) = 5.3 — 33(x + 0.1) + 129.83(x + 0.1)x — 556.6(x + 0.1)x(x — 0.2) 
b. Pa(x) = P3(x) + 2730.243387(x + 0.1) x — 0.2)(x — 0.3) 
9. a. f (0.05) © 1.05126 b. f (0.65) © 1.91555 ce. f (0.43) © 1.53725 
11. a. P(—2) = Q(-2) = -1, P(-1) = Q(-1]) = 3, PO) = QO) = 1, PU) = QI) = -1, PQ) =O) =3 
b. The format of the polynomial is not unique. If P(x) and Q(x) are expanded, they are identical. There is only one 
interpolating polynomial if the degree is less than or equal to four for the given data. However, it can be expressed in 
various ways depending on the application. 
13. The coefficient of x? is 3.5. 
15. The approximation to f(0.3) should be increased by 5.9375. 
17. flxo] = f@o) = 1, flu] = fi) = 3, fl%o.x)] =5 


19. 


21. 


Since f[x2] = flxo] + f[%0,%1] (42 — Xo) + a2 (x2 — X90) (2 — *1), 
flx2] — f [xo] Sf [x0, +1] 
a= 


7 (X2 — Xo) (%2 — x1) (% — x1) 


This simplifies to fx, x1, x2]. 
Let P(x) = fl%ig] + hei Sigs + ++ Xi] — Xin) ++ % — Xj.) and P(x) = flo] + ee Fo, - «Xe (@ — x0) +++ — xy). The 
polynomial P(x) interpolates f(x) at the nodes x;,,...,x;,, and the polynomial P(x) interpolates f(x) at the nodes xo,...,Xn- 


Since both sets of nodes are the same and the interpolating polynomial is unique, we have P(x) = P(x). The coefficient of x” 
in P(x) is f[xi,,...,%;,], and the coefficient of x” in P(x) is f[Xo,..-,%,]. Thus, f[xj),...,%;,] = fl%o.--->Xn]- 


Exercise Set 3.4 (Page 142) 


1 


. The coefficients for the polynomials in divided-difference form are given in the following tables. For example, the 


polynomial in part (a) is 


H3(x) = 17.56492 + 3.116256(x — 8.3) + 0.05948(x — 8.3)? — 0.00202222(x — 8.3)?(x — 8.6). 


a. b. c. d. 

17.56492 0.22363362 —0.02475 —0.62049958 
3.116256 2.1691753 0.751 3.5850208 
0.05948 0.01558225 2.751 —2.1989182 

—0.00202222 —3.2177925 1 —0.490447 
0 0.037205 
0 0.040475 
—0.0025277777 
0.0029629628 
3. Approximation Actual 
x to f(x) Ff) Error 
a. 8.4 17.877144 17.877146 2.33 x 10° 
b. 0.9 0.44392477 0.44359244 3.3323 x 1074 
c. -} 0.1745185 0.17451852 1.85 x 10-8 
d. 0.25 —0.1327719 —0.13277189 5.42 x 10-° 
5. a. We have sin 0.34 © H5(0.34) = 0.33349. 


b. The formula gives an error bound of 3.05 x 10~'4, but the actual error is 2.91 x 10~°. The discrepancy is due to the fact 
that the data are given to only five decimal places. 

c. We have sin 0.34 * H7(0.34) = 0.33350. Although the error bound is now 5.4 x 107°, the accuracy of the given data 
dominates the calculations. This result is actually less accurate than the approximation in part (b), since 
sin 0.34 = 0.333487. 
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7. For 3(a), we have an error bound of 5.9 x 10~%. The error bound for 3(c) is 0 since f(x) = 0, for n > 3. 


9. H3(1.25) = 1.169080403 with an error bound of 4.81 x 107°, and H5(1.25) = 1.169016064 with an error bound of 
4.43 x 1074. 


Exercise Set 3.5 (Page 161) 


1. S(x) =x on [0, 2]. 
3. The equations of the respective free cubic splines are 


S(x) = Six) = a; + B(x — x) + cx — x)? + dix — xi)’, 


for x in [x;,x;41], where the coefficients are given in the following tables. 


ai aj b; Ci d; 
0 17.564920 3.13410000 0.00000000 0.00000000 
b. i aj b; Ci d; 


0 0.22363362 2.17229175 0.00000000 0.00000000 


c i aj b; Ci d; 
—0.02475000 1.03237500 0.00000000 6.50200000 
1 0.33493750 2.25150000 4.87650000 —6.50200000 
d. i qj b; Ci d; 
0 —0.62049958 3.45508693 0.00000000 —8.9957933 
1 —0.28398668 3.18521313 —2.69873800 —0.94630333 
2 0.00660095 2.61707643 —2.98262900 9.9420966 
5. Approximation Actual 
Pa to f(x) f@) Error 
a. 8.4 17.87833 17.877146 1.1840 x 1073 
b. 0.9 0.4408628 0.44359244 2.7296 x 10-3 
c. -i 0.1774144 0.17451852 2.8959 x 1077 
d. 0.25 —0.1315912 —0.13277189 1.1807 x 1073 
Approximation Actual 
x to f’(x) f'@) Error 
a. 8.4 3.134100 3.128232 5.86829 x 10-3 
b. 0.9 2.172292 2.204367 0.0320747 
c. -i 1.574208 1.668000 0.093792 
d. 0.25, 2.908242 2.907061 1.18057 x 10-7 


7. The equations of the respective clamped cubic splines are 
s(x) = s;(x) = a; + B(x — x) + ex — x) +x — x)’, 


for x in [x;,x;41], where the coefficients are given in the following tables. 


a. i aj b; Ci d; 


0 17.564920 3.1162560 0.060087 —0.002022 
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b. i aj Db; Cj d, 
0 0.22363362 2.1691753 0.65914075 —3.2177925 
Cc. 1 aj bj Ci d; 
0 —0.02475000 0.75100000 2.5010000 1.0000000 
1 0.33493750 2.18900000 3.2510000 1.0000000 
d. i aj Db; d; 
0 —0.62049958 3.5850208 —2.1498407 —0.49077413 
1 —0.28398668 3.1403294 —2.2970730 —0.47458360 
2 0.006600950 2.6666773 —2.4394481 —0.44980146 
9. Approximation Actual 
x to f(x) ff) Error 
a. 8.4 17.877144 17.877146 0.188 x 107° 
b. 0.9 0.4439248 0.44359244 3.323 x 10-4 
c. -i 0.17451852 0.17451852 0 
d. 0.25 —0.13277221 —0.13277189 3.19 x 1077 
Approximation Actual 
x to f’(x) St’) Error 
a. 8.4 3.128213 3.128232 1.90 x 10-5 
b. 0.9 2.204470 2.204367 1.0296 x 1074 
c. -5 1.668000 1.668000 0 
d. 0.25 2.908242 2.907061 1.18057 x 1073 
ll. b=-1,c=-3,d=1 


13. 
15. 


1 1 1 1 
B=7,D=4,b=-7,d=j 


The equation of the spline is 


S(x) = Six) = aj + B(x — x) + ox — x)? + d(x — x)’, 


for x in [x;,%;41], where the coefficients are given in the following table. 


Xi aj bj Ci dj 
0 1.0 —0.7573593 0.0 —6.627417 
0.25 0.7071068 —2.0 —4.970563 6.627417 
0.5 0.0 —3.242641 0.0 6.627417 
0.75 —0.7071068 —2.0 4.970563 —6.627417 


i S(x)dx = 0.000000, S’(0.5) = —3.24264, and S”(0.5) = 0.0 
17. The equation of the spline is 


s(x) = six) = a; + B(x — x) + (x — mi) + dix — x)’, 


for x in [%;,%;41], where the coefficients are given in the following table. 
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Xi aj bj Ci dj 
0 1.0 0.0 —5.193321 2.028118 
0.25 0.7071068 —2.216388 —3.672233 4.896310 
0.5 0.0 —3.134447 0.0 4.896310 
0.75 —0.7071068 —2.216388 3.672233 2.028118 


fc s(x) dx = 0.000000, s’(0.5) = —3.13445, and s”(0.5) = 0.0 


19, Let f(x) =a4+ bx + cx? + dx’. Clearly, f satisfies properties (a), (c), (d), and (e) of Definition 3.10, and f interpolates 
itself for any choice of xo,...,x,. Since (i1) of property (f) in Definition 3.10 holds, f must be its own clamped cubic 
spline. However, f” (x) = 2c + 6dx can be zero only at x = —c/3d. Thus, part (i) of property (f) in Definition 3.10 cannot 
hold at two values x9 and x,. Thus, f cannot be a natural cubic spline. 


21. The piecewise linear approximation to f is given by 


20(e°! — 1)x +1, for x in [0, 0.05] 


F = 
@) eae — e)x 42601 — 62, for x in (0.05, 1]. 


We have 


0.1 


0.1 
/ F(x) dx = 0.1107936 = and f(x) dx = 0.1107014. 
0 0 


25. a. On [0,0.05], we have s(x) = 1.000000 + 1.999999x + 1.998302x? + 1.401310x3, and on (0.05, 0.1], we have 
s(x) = 1.105170 + 2.210340(x — 0.05) + 2.208498(x — 0.05)? + 1.548758(x — 0.05)?, 


b. J. s(x) dx = 0.110701 
. 1.6 x 10-7 


d. On [0,0.05], we have S(x) = 1 + 2.04811x + 22.12184x3, and on (0.05, 0.1], we have 
S(x) = 1.105171 + 2.214028(« — 0.05) + 3.318277(x — 0.05)* — 22.12184(x — 0.05)*. (0.02) = 1.041139 and 
$(0.02) = 1.040811. 


2x — x’, O<x<l 
l1+(@—-1*, l<x<2 


ie) 


27. S(x) = 
29. The spline has the equation 
s(x) = s;(x) = a; + B(x — x) + ex — x) +4 — x)’, 


for x in [x;,x;41], where the coefficients are given in the following table. 


Xj a; b; Ci d; 

0 0 75 —0.659292 0.219764 
3 225 76.9779 1.31858 —0.153761 
5 383 80.4071 0.396018 —0.177237 
8 623 77.9978 —1.19912 0.0799115 


The spline predicts a position of s(10) = 774.84 ft and a speed of s’(10) = 74.16 ft/s. To maximize the speed, we find the 
single critical point of s’(x), and compare the values of s(x) at this point and the endpoints. We find that max 
s'(x) = 5'(5.7448) = 80.7 ft/s = 55.02 mi/h. The speed 55 mi/h was first exceeded at approximately 5.5 s. 


31. The equation of the spline is 
S(x) = S;(x) = a; + bx — x;) + (x — x)? +(x — x)’, 
for x in [x;,x;41], where the coefficients are given in the following table. 
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Sample 1 Sample 2 
Xi a; b; Ci d; aj bj Ci d; 
0 6.67 —0.44687 0 0.06176 6.67 1.6629 0 —0.00249 
6 17.33 6.2237 1.1118 —0.27099 16.11 1.3943 —0.04477 —0.03251 
10 42.67 2.1104 —2.1401 0.28109 18.89 —0.52442 —0.43490 0.05916 
13 37,33 —3.1406 0.38974 —0.01411 15.00 —1.5365 0.09756 0.00226 


17 30.10 —0.70021 0.22036 —0.02491 10.56 —0.64732 0.12473 —0.01113 
20 = 29.31 —0.05069 —0.00386 0.00016 9.44 —0.19955 0.02453 —0.00102 


33. The three natural splines have equations of the form 
Si(x) = a; + Bie — x) + ce — x)? + d(x — x)’, 


for x in [%;,x;41], where the values of the coefficients are given in the following tables. 


Spline 1 
u Xj a; = f(%) b; Ci d, 
) 1 3.0 0.786 0.0 —0.086 
1 2 3.7 0.529 —0.257 0.034 
2 5 3.9 —0.086 0.052 0.334 
3 6 4.2 1.019 1.053 —0.572 
4 7 5.7 1.408 —0.664 0.156 
5 8 6.6 0.547 —0.197 0.024 
6 10 TA 0.049 —0.052 —0.003 
7 13 6.7 —0.342 —0.078 0.007 
8 17 4.5 

Spline 2 
i Xj a; = f (xi) b; Cj d 
0 17 4.5 1.106 0.0 —0.030 
1 20 7.0 0.289 —0.272 0.025 
2 23 6.1 —0.660 —0.044 0.204 
3 24 5.6 —0.137 0.567 —0.230 
4 25 5.8 0.306 —0.124 —0.089 
5 27 5.2 —1.263 —0.660 0.314 
6 27.7 4.1 

Spline 3 
i Xj a; = f (xi) b; Cj d, 
0 27.7 4.1 0.749 0.0 —0.910 
1 28 4.3 0.503 —0.819 0.116 
2 29 4.1 —0.787 —0.470 0.157 
3 30 3.0 


Exercise Set 3.6 (Page 170) 


loa x) =-10° 4+ 14° +14, yO =—-2P 43°41 
b. x(t) = —10° 4 14.5274 0.5t, y(t) = —3° 4 4.52 + 0.5¢ 
ce x7) =-10°+14° +14, yO =-4P 45°41 
d. x(t) = —10° + 130? 4+ 21, y(t) = 2t 
3. a. x(t) =—-11.5P4 15° 4+1.5t4+1, yi) = 4.25 +4.5° +0.75¢ + 1 
b. x(t) = —6.25° + 10.57 +0.75t+1, y(t) =—3.5P4 3° +41.5t+1 
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c. For t between (0,0) and (4,6), we have 
x(t) = —5h4+7.5P +1.5t, y(t) = —13.5¢° + 182 + 1.50, 
and for t between (4,6) and (6,1), we have 
x(t) =—5.5P+6°4+1.5t+4, y(t) = 4° — 6 —31+6. 
d. For ¢t between (0,0) and (2,1), we have 
x(t) = —5.5P +4674 1.5t, y(t) = —0.5° + 1.51, 
for t between (2,1) and (4,0), we have 
x(t) = 4° 43° 43t4+2, yQ=—P41, 
and for t between (4,0) and (6,—1), we have 
x(t) = —8.5° 413.5 —3t+4, y(t) = —3.2527 45.2527 — 31. 


5. a. Using the forward divided difference gives the following table. 


0 uo 

0 Uo 3(u, — Uo) 

1 U3 U3 — Uo U3 — 3u, + 2ug 

1 U3 3(u3 — U2) 2u3 — 3u2 + Up u3 — 3u2 + 3u, — uo 
Therefore 


u(t) = up + 3(uy — Uo)t + (U3 — 3u, + 2u9)t? + (U3 — 3uo + 3u, — uw) P(t — 1) 
= Ug + 3(uy =a uo)t + (—6u, + 3u0 + 3ur)t? + (u3 = 3u + 3u, = ut. 


Similarly, v(t) = vo + 3(v; — v9)t + (v2 — 6v, + 3u9)t? + (v3 — 3v2 + 3v; — v9)P?. 
b. Using the formula for Bernstein polynomials gives 


u(t) = uo(1 — t)? + 3uyt(1 — 1)? + 3m? (1 — 1) +307 
= uy + 3(uy, — Uo)t + Bur — 6u, + 3u0)t? + (U3 — 3u2 + 3u, — Up) f?. 


Similarly, 
ay 
vi) = pa (;) v,t*(1 — 13-* 
k=0 
= vp + 3(v, — Up)f + Buz — 6v; + 3u9)t? + (v3 — 3u2 + 3u, — vo) 0. 


Exercise Set 4.1 (Page 182) 


1. From the forward-backward difference formula (4.1), we have the following approximations: 


789 


a. f'(0.5) © 0.8520, f’(0.6) © 0.8520, f’(0.7) © 0.7960 b. f'(0.0) © 3.7070, f’(0.2) © 3.1520, f’(0.4) © 3.1520 
3. a. x Actual Error Error Bound b. x Actual Error Error Bound 
0.5 0.0255 0.0282 0.0 0.2930 0.3000 
0.6 0.0267 0.0282 0.2 0.2694 0.2779 
0.7 0.0312 0.0322 0.4 0.2602 0.2779 


5. For the endpoints of the tables, we use Formula (4.4). The other approximations come from Formula (4.5). 
a. f’(1.1) © 17.769705, f’(1.2) © 22.193635, f’(1.3) © 27.107350, f’(1.4) © 32.150850 
b. f'(8.1) © 3.092050, f’(8.3) © 3.116150, f’(8.5) © 3.139975, f’(8.7) © 3.163525 
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ec. f'(2.9) © 5.101375, f'(3.0) 6.654785, f'(3.1) © 8.216330, f’(3.2) © 9.786010 
d. f’(2.0) © 0.13533150, f’(2.1) + —0.09989550, f’(2.2) ~ —0.3298960, f’(2.3) ~ —0.5546700 


7a. x Actual Error Error Bound b. x Actual Error Error Bound 
1.1 0.280322 0.359033 8.1 0.00018594 0.000020322 
1.2 0.147282 0.179517 8.3 0.00010551 0.000010161 
1.3 0.179874 0.219262 8.5 9.116 x 10-> 0.000009677 
1.4 0.378444 0.438524 8.7 0.00020197 0.000019355 
Cc x Actual Error Error Bound d. x Actual Error Error Bound 
2.9 0.011956 0.0180988 2.0 0.00252235 0.00410304 
3.0 0.004925 1 0.00904938 2.1 0.00142882 0.00205 152 
3.1 0.0004765 0.00493920 2:2, 0.0020485 1 0.00260034 
3.2 0.0013745 0.00987840 2.3 0.00437954 0.00520068 


9. The approximations and the formulas used are: 
a. f’(2.1) © 3.899344 from (4.7) — f’(2.2) © 2.876876 from (4.7) —f’(2.3) © 2.249704 from (4.6) —_f’(2.4) © 1.837756 
from (4.6) f’(2.5) © 1.544210 from (4.7) —_f’(2.6) © 1.355496 from (4.7) 
b. f’(—3.0) © —5.877358 from (4.7) _f’(—2.8) © —5.468933 from (4.7) f’(—2.6) © —5.059884 from (4.6) 
f'(—2.4) © —4.650223 from (4.6)  f’(—2.2) © —4.239911 from (4.7) — f’(—2.0) © —3.828853 from (4.7) 


ll. a. x Actual Error Error Bound bo x Actual Error Error Bound 
2.1 0.0242312 0.109271 —3.0 1.55 x 1075 6.33 x 1077 
2.2 0.0105138 0.0386885 —2.8 1.32 x 107> 6.76 x 1077 
2.3 0.0029352 0.0182120 —2.6 7.95 x 1077 1.05 x 1077 
2.4 0.0013262 0.00644808 —2.4 6.79 x 1077 1.13 x 1077 
2 0.0138323 0.109271 —2.2 1.28 x 10-> 6.76 x 1077 
2.6 0.0064225 0.0386885 —2.0 7.96 x 10~® 6.76 x 1077 


13. f'(3) © lf) — 8f(2) + 8f(4) — f(5)] = 0.21062, with an error bound given by 


[FP NaIR” 23 
max ————— < — 
1<x<5 30 30 


= 0.76. 


15. From the forward-backward difference formula (4.1), we have the following approximations: 
a. f'(0.5) © 0.852, f’(0.6) © 0.852, f'(0.7) © 0.7960 
b. f'(0.0) © 3.707, f'(0.2) © 3.153, f’(0.4) © 3.153 
17. For the endpoints of the tables, we use Formula (4.7). The other approximations come from Formula (4.6). 
a. f’(2.1) © 3.884 f'(2.2) 2.896 f"(2.3) + 2.249 f'"(2.4) + 1.836 f"(2.5) © 1.550 f"(2.6) © 1.348 
b. f'(—3.0) + —5.883  f’(—2.8) + —5.467 — f"(—2.6)  —5.059 ss f"(—2.4)  —4.650 sf" (—2.2) © —4.208 
f'(—2.0) © —3.875 
19. The approximation is —4.8 x 10~°. f”(0.5) = 0. The error bound is 0.35874. The method is very accurate since the function 
is symmetric about x = 0.5. 
21. a. f'(0.2) © —0.1951027 b. f/(1.0) © —1.541415 ec. f'(0.6) © —0.6824175 
23. f'(0.4) © —0.4249840 and f’(0.8) © —1.032772. 
25. The three-point formulas give the results in the following table. 


Time | © | 3 | 5 | 8 | W | 13 
Speed | 79 | 824 | 742 | 768 | 694 | 712 


27. The approximations eventually become zero because the numerator becomes zero. 
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29. Since e’(h) = —e/h* + hM/3, we have e’(h) = 0 if and only if h = %/3e/M. Also, e’(h) < 0 if h < </3e/M and e’(h) > 0 
if h > /3e/M, so an absolute minimum for e(h) occurs at h = </3e/M. 


Exercise Set 4.2 (Page 191) 


1. a. f’(1) © 1.0000109 b. f’(0) © 2.0000000 c. f'(1.05) © 2.2751459 d. f’(2.3) © —19.646799 
3. a. f’(1) © 1.001 b. f’(0) © 1.999 ce. f’(1.05) © 2.283 d. f’(2.3) + -19.61 

5. fy sinx dx © 1.999999 

9. Let 


3 2 3 8 


te hy _ 
No(h) =w(*) ; (se ww) nee. (*) r (= =), 


Then N3(h) is an O(h>) approximation to M. 
11. Let N(h) = (1+ h)'/", Nz (h) = 2N (3) — N(A), N3(h) = No (9) + $ (M2 (§) — M2 (A). 
a. N(0.04) = 2.665836331, N(0.02) = 2.691588029, N(0.01) = 2.704813829 
b. N2(0.04) = 2.717339727, N2(0.02) = 2.718039629. The O(h*) approximation is N3(0.04) = 2.718272931. 
c. Yes, since the errors seem proportioned to h for N(h), to h? for Ny(h), and to h? for N3(h). 


15. ck 4 8 16 32 64 128 256 512 
pe | 22 | 3.0614675 | 3.1214452 | 3.1365485 | 3.1403312 | 3.1412723 | 3.1415138 | 3.1415729 
P, 4 3.3137085 | 3.1825979 | 3.1517249 | 3.144184 | 3.1422236 | 3.1417504 | 3.1416321 

d. Values of px and P;, are given in the following tables, together with the extrapolation results: 
For px: 
2.8284271 
3.0614675 3.1391476 
3.1214452 3.1414377 3.1415904 
3.1365485 3.1415829 3.1415926 3.1415927 
3.1403312 3.1415921 3.1415927 3.1415927 3.1415927 
For Py: 
4 
3.3137085 3.0849447 
3.1825979 3.1388943 3.1424910 
3.1517249 3.1414339 3.1416032 3.1415891 
3.1441184 3.1415829 3.1415928 3.1415926 3.1415927 


Exercise Set 4.3 (Page 202) 


1. The Trapezoidal rule gives the following approximations. 


a. 0.265625 b. —0.2678571 ce. —0.17776434 d. 0.1839397 

e. —0.8666667 f. —0.1777643 g. 0.2180895 h. 4.1432597 
3. Actual Error Error Bound 

a. 0.071875 0.125 

b. 7.943 x 10-4 9.718 x 107+ 

c. 0.0358147 0.0396972 

d. 0.0233369 0.1666667 

e. 0.1326975 0.5617284 

f. 9.443 x 10-4 1.0707 x 1073 

g. 0.0663431 0.0807455 

h. 1.554631 2.298827 
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5. Simpson’s rule gives the following approximations. 


a. 0.1940104 
e. —0.7391053 


b. —0.2670635 
f. —0.1768216 


c. 0.1922453 
g. 0.1513826 


Te Actual Error Error Bound 
a. 2.604 x 107+ 2.6042 x 107+ 
b. 7.14 x 1077 9.92 x 1077 
c. 1.406 x 1075 2.170 x 1075 
d. 1.7989 x 1073 4.1667 x 1074 
e. 5.1361 x 1073 0.063280 
f. 1.549 x 10-6 2.095 x 10~® 
g. 3.6381 x 1074 4.1507 x 10-4 
h. 4.9322 x 1073 0.1302826 
9. The Midpoint rule gives the following approximations. 


a. 0.1582031 
e. —0.6753247 


b. —0.2666667 
f. —0.1768200 


ec. 0.1743309 
g. 0.1180292 


d. 0.16240168 
h. 2.5836964 


d. 0.1516327 
h. 1.8039148 


11. Actual Error Error Bound 
a. 0.0355469 0.0625 
b. 3.961 x 10-4 4.859 x 10-4 
c. 0.0179285 0.0198486 
d. 8.9701 x 107 0.0833333 
e. 0.0564448 0.2808642 
f. 4.698 x 10-4 5.353 x 10-4 
g. 0.0337172 0.0403728 
h. 0.7847138 1.1494136 

13. f(l) =4 

15. The degree of precision is 3. 

17. = 7 = F.0= 34 

19. cg =cy = ; gives the highest degree of precision, 1. 


21. The following approximations are obtained from Formula (4.23) through Formula (4.30), respectively. 
a. 0.1024404, 0.1024598, 0.1024598, 0.1024598, 0.1024695, 0.1024663, 0.1024598, and 0.1024598 
. 0.7853982, 0.7853982, 0.7853982, 0.7853982, 0.7853982, 0.7853982, 0.7853982, and 0.7853982 
. 1.497171, 1.477536, 1.477529, 1.477523, 1.467719, 1.470981, 1.477512, and 1.477515 
- 4.950000, 2.740909, 2.563393, 2.385700, 1.636364, 1.767857, 2.074893, and 2.116379 
. 3.293182, 2.407901, 2.359772, 2.314751, 1.965260, 2.048634, 2.233251, and 2.249001 
~ 0.5000000, 0.6958004, 0.7126032, 0.7306341, 0.7937005, 0.7834709, 0.7611137, and 0.7593572 
23. The errors in Exercise 22 are 1.6 x 10~°, 5.3 x 10-8, —6.7 x 10-7, —7.2 x 1077, and —1.3 x 107°, respectively. 


25. If E(x") = 0, for all k =0,1,...,n and E(x"t!) ¥ 0, then with p,.;(x) = x"t!, we have a polynomial of degree n+ 1 for 
which E(pn4i(x)) 4 0. Let p(x) = a,x" +--+ + ax + ao be any polynomial of degree less than or equal to n. Then 
E(p(s)) = QnE(x") +--+ + a:E(x*) + aoE(1) = 0. Conversely, if E(p(x)) = 0, for all polynomials of degree less than or 
equal to n, it follows that E(x*) = 0, for all k = 0,1,...,n. Let pag i(®) = Quy ix"t! +--+ + ap be a polynomial of degree 
n+1 for which E(pn4i(x)) 4 0. Since an41 4 0, we have 


moka & 


a 
n+l 
x = Pn+i (x) x ore . 
Gn+1 An+1 An+1 
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Then 


n+1 1 an n dao 
EQ) = E(Pn4i@)) EQ") ++: EQ) = 


Qn+1 Qn+1 An+1 An+1 


Thus, the quadrature formula has degree of precision n. 


Exercise Set 4.4 (Page 210) 


1. The Composite Trapezoidal rule approximations are: 


E(pnsi(x)) F 0. 


reap a 


. —6.42872 
- 0.970926 
. —6.274868 
. 0.9610554 


. —6.11274 
. 0.947868 
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a. 0.639900 b. 31.3653 c. 0.784241 
e. —13.5760 f. 0.476977 g. 0.605498 
3. a. 0.6363098 b. 22.47713 c. 0.783980 
e. —14.18334 f. 0.4777547 g. 0.6043941 
5. The Composite Midpoint rule approximations are: 
a. 0.633096 b. 11.1568 c. 0.786700 
e. —14.9985 f. 0.478751 g. 0.602961 
7. a. 3.15947567 b. 3.10933713 c. 3.00906003 
9. a = 0.75 
11. a. The Composite Trapezoidal rule requires h < 0.000922295 and n > 2168. 
b. The Composite Simpson’s rule requires h < 0.037658 and n > 54. 
c. The Composite Midpoint rule requires h < 0.00065216 and n > 3066. 
13. a. The Composite Trapezoidal rule requires h < 0.04382 and n > 46. The approximation is 0.405471. 
b. The Composite Simpson’s rule requires h < 0.44267 and n > 6. The approximation is 0.405466. 
c. The Composite Midpoint rule requires h < 0.03098 and n > 64. The approximation is 0.405460. 
15. a. Because the right and left limits at 0.1 and 0.2 for f, f’, and f” are the same, the functions are continuous on [0, 0.3]. 


However, 


6, O<x<01 
f(x) = 412, O1 <x <02 
12, 0.2<x<03 


is discontinuous at x = 0.1. 
b. We have 0.302506 with an error bound of 1.9 x 107+. 
c. We have 0.302425, and the value of the actual integral is the same. 
17. a. For the Composite Trapezoidal rule, we have 


h2 
12 


h3 n h n n 
E(f)=-5 PQ) =-5 LP @hi=-5 Oa 
j=l j=l 


j=l 


where Ax; = x;41 — 4; =/h for each j. Since }°_, f”(&) Ax; is a Riemann sum for fie f(x) dx = f'(b) — f'(a), we have 


h2 
E( f) * —aat — f'@l. 


b. For the Composite Midpoint rule, we have 


IB n/2 re n/2 
BKA= 7 LIO@=% LF Oew. 
j=l j=l 


But 2" f’(&)(2h) is a Riemann sum for f” f(x) dx = f'(b) — f'(a), so 


h2 
E( f) * Elr@® — f'@). 
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19. a. The estimate using the Composite Trapezoidal rule is —in In2 = —6.296 x 10~°. 
b. The estimate using the Composite Simpson’s rule is -pr = —3.75 x 10~°. 
c. The estimate using the Composite Midpoint rule is ih? In2 = 6.932 x 107°. 

21. The length is approximately 15.8655. 

23. Composite Simpson’s rule with h = 0.25 gives 2.61972 s. 


25. The length is approximately 58.47082, using n = 100 in the Composite Simpson’s rule. 


Exercise Set 4.5 (Page 218) 


1. Romberg integration gives R33 as follows: 


a. 0.1922593 b. 0.1606105 c. —0.1768200 d. 0.08875677 

e. 2.5879685 f. —0.7341567 g. 0.6362135 h. 0.6426970 
3. Romberg integration gives R44 as follows: 

a. 0.1922594 b. 0.1606028 c. —0.1768200 d. 0.08875528 

e. 2.5886272 f. —0.7339728 g. 0.6362134 h. 0.6426991 


5. Romberg integration gives: 
a. 0.19225936 with n = 4 b. 0.16060279 with n = 5 ce. —0.17682002 withn = 4 d. 0.088755284 with n = 5 
e. 2.5886286 with n = 6 f. —0.73396918 withn=6 _ g. 0.63621335 with n = 4 h. 0.64269908 with n = 5 
7. R33 = 11.5246 
9. f(2.5) © 0.43459 


11. Ry, =5 
13. We have 
ok-2 
Reo = “ie A = ; Re ep Plies > flat (i—1/2))hy_1) |, from (4.34), 
i=l 
LP het = . 
= 5|“Su@ + f(b)) ther SY) flat ihy-1) 


i=! 
gk-2 
+2h1 > fla+(i-1 2m), from (4.34) with k — 1 instead of k, 
i=1 
gk-2_ 1 gk-2 


1 
= =| (f(a) + f(D) +2 D> flat 2ihy) + 4in D> flat Qi - Dh) 


3 
i=1 i=l 


M-1 M 
= ; [Fo + f(b) +25~ f(a+2ih)+4)° flat Qi- vi , where h = /y and M = 2*-?. 


i=1 i=l 


15. Equation (4.34) follows from 


gk-1_1 
hy 


Riu = >| f@+fO+2 DY fat ies) 


i=1 


h ak-1_y : 
=F | f@+ fo +2 a fat sh) 
h ak-1_4 ok-2 
=F | FO+FO+2 fat thea) +290 fat G- 1/2)hn1) 


i=1 i=l 
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gk-2_] 
k-1 


Nile 


2 


i=1 


gk-2 


1 
= 5 | Reta the So fat G = 1/2)hy-1) 


i=1 


Exercise Set 4.6 (Page 227) 


1. Simpson’s rule gives 


h 
—— | f@+FfO+2 D> fat ir) 
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gk-2 


+I D> f(a + = 1/2)hx-1) 


i=l 


a. S(1, 1.5) = 0.19224530, S(1, 1.25) = 0.039372434, $(1.25, 1.5) = 0.15288602, and the actual value is 0.19225935. 
b. S(O, 1) = 0.16240168, S(0,0.5) = 0.028861071, S(0.5, 1) = 0.13186140, and the actual value is 0.16060279. 
ce. S(0,0.35) = —0.17682156, S(0,0.175) = —0.087724382, S(0.175, 0.35) = —0.089095736, and the actual value is 


—0.17682002. 


. 5(0, =) = 0.087995669, S(0, 2) = 0.0058315797, S(4, +) = 0.082877624, and the actual value is 0.088755285. 


B74 


. S(O, F) = 2.5836964, S(0, F) = 0.33088926, SCF, 7) = 2.2568121, and the actual value is 2.5886286. 


. 5$(3,3.5) = 0.63623873, S(3, 3.25) = 0.32567095, S(3.25, 3.5) = 0.31054412, and the actual value is 0.63621334. 


d 
e 
f. SC, 1.6) = —0.73910533, S(1, 1.3) = —0.26141244, $(1.3, 1.6) = —0.47305351, and the actual value is —0.73396917. 
8 
h 


. S(O, =) = 0.64326905, S(0, =) = 0.37315002, SC y = 0.26958270, and the actual value is 0.64269908. 


3. Adaptive quadrature gives: 


a. 108.555281 b. —1724.966983 


c. —15.306308 


d. —18.945949 


5. Simpson’s Number Adaptive Number 
Rule Evaluation Error Quadrature Evaluation Error 
a —0.21515695 57 6.3 x 10~° —0.21515062 229 1.0 x 10-8 
b. 0.95135226 83 9.6 x 10-6 0.95134257 217 1.1 x 107-7 
c —6.2831813 41 4.0 x 10~° —6.2831852 109 1.1 x 1077 
d 5.8696024 27 2.6 x 10-6 5.8696044 109 4.0 x 10-° 
7. J." u(t) dt © 0.00001 
9. We have, for h = b— a, 
a+b hn 
T(a,b) —T (a, T pb) t 
riae)— 7 (a 2") 7 (*F" 0) = Errol 
and 
a+b a+b hs 
d ; ,b)| * " 
| f@) & (« : ) ( ) gif’ 
So 
e b b 1 b b 
fQd dx (a = ) (S 1b) * 5 Tab) T(a*5 ) (S 6) 


Exercise Set 4.7 (Page 234) 


1. Gaussian quadrature gives: 


a. 0.1922687 b. 0.1594104 

e. 2.5913247 f. —0.7307230 
3. Gaussian quadrature gives: 

a. 0.1922594 b. 0.1606028 

e. 2.5886327 f. —0.7339604 


c. —0.1768190 
. 0.6361966 


c. —0.1768200 
. 0.6362133 


d. 0.08926302 
h. 0.6423172 


d. 0.08875529 
h. 0.6426991 
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5.a=1,b=1,c=},d=-} 


3 


9. The exact value to 10 digits is 0.878884623. Part (a) gives 0.878884623, with absolute error 4 x 10-!°. Part (b) gives 


0.878884546, with absolute error 7.66 x 10~°. Part (c) gives 0.878387796, with absolute error 4.97 x 10-+. All the 
approximations require 8 function evaluations, and Gaussian quadrature for a given n chooses the interpolation nodes 
optimally. The composite methods in (b) and (c) do not use these nodes so they should not be expected to give as accurate 
results. 


Exercise Set 4.8 (Page 248) 


1. Algorithm 4.4 with n = m = 4 gives: 


a. 0.3115733 b. 0.2552526 c. 16.50864 d. 1.476684 
3. Algorithm 4.4 with n = 4 and m = 8,n=8 and m=4, and n =m = 6 gives: 

a. 0.5119875, 0.5118533, 0.5118722 b. 1.718857, 1.718220, 1.718385 

c. 1.001953, 1.000122, 1.000386 d. 0.7838542, 0.7833659, 0.7834362 

e. —1.985611, —1.999182, —1.997353 f. 2.004596, 2.000879, 2.000980 

g. 0.3084277, 0.3084562, 0.3084323 h. —22.61612, —19.85408, —20.14117 


17. 


. Algorithm 4.5 with n = m = 2 gives: 


a. 0.3115733 b. 0.2552446 c. 16.50863 d. 1.488875 


. Algorithm 4.5 with n = m= 3,n=3 andm=4,n=4 and m= 3, andn=m = 4 gives: 


a. 0.5118655, 0.5118445, 0.5118655, 0.5118445, 2.1 x 107°, 1.3 x 1077, 2.1 x 107°, 1.3 x 107” 
b. 1.718163, 1.718302, 1.718139, 1.718277, 1.2 x 10+, 2.0 x 107°, 1.4 x 107-4, 4.8 x 107° 

c. 1.000000, 1.000000, 1.0000000, 1.000000, 0, 0, 0, 0 

d. 0.7833333, 0.7833333, 0.7833333, 0.7833333, 0, 0, 0, 0 

e. —1.991878, —2.000124, —1.991878, —2.000124, 8.1 x 1073, 1.2 x 1074, 8.1 x 1073, 1.2 x 1074 
f. 2.001494, 2.000080, 2.001388, 1.999984, 1.5 x 1077, 8 x 107°, 1.4 x 107, 1.6 x 10> 

g. 0.3084151, 0.3084145, 0.3084246, 0.3084245, 10-°, 5.5 x 10-7, 1.1 x 107°, 6.4 x 1077 

h. —12.74790, —21.21539, —11.83624, —20.30373, 7.0, 1.5, 7.9, 0.564 


. Algorithm 4.4 with n = m= 14 gives 0.1479103, and Algorithm 4.5 with n = m = 4 gives 0.1506823. 
. The approximation to the center of mass is (x,y), where x = 0.3806333 and y = 0.3822558. 
. The area is approximately 1.0402528. 


. Algorithm 4.6 with n = m = p = 2 gives the first listed value. The second is the exact result. 


a. 5.204036, e(e® — 1)(e — 1)? b. 0.08429784, 4 c. 0.08641975, + 

d. 0.09722222, 4 e. 7.103932, 2+ 52? f. 1.428074, $(e7 + 1) —e 
Algorithm 4.6 with n = m = p = 4 gives the first listed value. The second is from Algorithm 4.6 with n =m =p=S. 
a. 5.206447, 5.206447 b. 0.08333333,0.08333333 c. 0.07142857,0.07 142857 
d. 0.08333333,0.08333333 e. 6.934912,6.934801 f. 1.476207, 1.476246 


19. The approximation 20.41887 requires 125 functional evaluations. 


Exercise Set 4.9 (Page 254) 


1. 


The Composite Simpson’s rule gives: 
a. 0.5284163 b. 4.266654 c. 0.4329748 d. 0.8802210 


. The Composite Simpson’s rule gives: 


a. 0.4112649 b. 0.2440679 c. 0.05501681 d. 0.2903746 


. The escape velocity is approximately 6.9450 mi/s. 
a. fe e* f (x) dx © 0.8535534 f (0.5857864) + 0.1464466 f(3.4142136) 


b. i e-* f (x) dx © 0.7110930 f (0.4157746) + 0.2785177 f (2.2942804) + 0.0103893 f (6.289945 1) 


. n= 2: 2.9865139 n = 3: 2.9958198 
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Exercise Set 5.1 (Page 264) 
: af ae Tae ; 
1. a. Since f(t, y) = ycost, we have py ey) = cosf, and f satisfies a Lipschitz condition in y with L = 1 on 
'y 


D= {(t,y)|0 <t < 1,—0o <y < w&}. 
Also, f is continuous on D, so there exists a unique solution, which is y(t) = e"". 
2 a 2 
b. Since f(t,y) = re +t’e', we have a = and f satisfies a Lipschitz condition in y with L = 2 on 
y 


D=((t,y)|l < t < 2,-c© <y < oo}. 


Also, f is continuous on D, so there exists a unique solution, which is y(t) = P(e —e). 


af 


2 2 
ec. Since f(t,y) = = +t’e', we have by ar and f satisfies a Lipschitz condition in y with L = 2 on 
y 


D=((t,y)|l < t < 2,-c© <y < oo}. 
Also, f is continuous on D, so there exists a unique solution, which is 
yt) = (fe! — 4h! + 12P e' — 24te! + 240 + (V2 — 9)e)/?. 


4p y of 4 
, we have — = ; 
1+¢ dy 14+f 


d. Since f(t,y) = and f satisfies a Lipschitz condition in y with L = 2 on 


D= {(t,y)|0 < t < 1,—00 < y < oo}. 


Also, f is continuous on D, so there exists a unique solution, which is y(t) = 1+ #4. 


3. a. Lipschitz constant L = 1; it is a well-posed problem. 
b. Lipschitz constant L = 1; it is a well-posed problem. 
c. Lipschitz constant L = 1; it is a well-posed problem. 
d. The function f does not a satisfy Lipschitz condition, so Theorem 5.6 cannot be used. 
5. a. Differentiating y*t + yt = 2 gives 3y’y't+ y>+y’t+ y = 0. Solving for y’ gives the original differential equation, and 


setting t = 1 and y = 1 verifies the initial condition. To approximate y(2), use Newton’s method to solve the equation 


y>+y—1=0. This gives y(2) © 0.6823278. 


797 


b. Differentiating ysint + f?e” + 2y — 1 = 0 gives y' sint + ycost + 2re” + fey’ + 2y’ = 0. Solving for y’ gives the original 
differential equation, and setting t = 1 and y = 0 verifies the initial condition. To approximate y(2), use Newton’s method 


to solve the equation (2 + sin2)y + 4e” — 1 = 0. This gives y(2) * —0.4946599. 


7. Let (t,¥,) and (t,y2) be in D, witha < t) < b,a< th <b, —~& < y, < oO, and —o < y) < oo. For0 <i < 1, we have 
(l—A)a < (1—A)t < (1 —A)b and Aa < At < db. Hence, a= (1 —A)a+Aa < (1—A)t +Ah < (1—A)b+Ab = BD. Also, 


—oo < (1 —A)y; + Ay2 < 00, so D is convex. 


9. a. Since y' = f(t, y(t)), we have 
[ve a= f f @¥(Z)) dz. 


So y() — ya) = fi f@y@) a& and y)=at fl f(z ye) dz. 
The iterative method follows from this equation. 


b. We have yo(t) = 1, i(t) = 14 $7, wp) = 14 $f — iP, and y3() = 14+ 4° — ih + Ze. 
c. We have y(t) =1+4+4$P-iP + 24-2 +... 
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Exercise Set 5.2 (Page 273) 


1. Euler’s method gives the approximations in the following table. 


a. i tj Wj yi) b. i tj Wj y(t) 
1 0.500 0.0000000 0.2836165 1 2.500 2.0000000 1.8333333 
1.000 1.1204223 3.2190993 2 3.000 2.6250000 2.5000000 
GQ 1 tj Wi yh) d. i tj Wi yt) 
1 1.250 2.7500000 2.7789294 1 0.250 1.2500000 1.3291498 
2 1.500 3.5500000 3.6081977 2 0.500 1.6398053 1.7304898 
3 1.750 4.3916667 4.4793276 3 0.750 2.0242547 2.0414720 
4 2.000 5.2690476 5.3862944 4 1.000 2.2364573 2.1179795 
3. a. ¢ Actual Error Error bound b. t Actual Error Error bound 
0.5 0.2836165 11.3938 25 0.166667 0.429570 
1.0 2.098677 1 42.3654 3.0 0.125000 1.59726 
c t Actual Error Error bound da. tft Actual Error 
1.25 0.0289294 0.0355032 0.25 0.0791498 
1.50 0.0581977 0.08 10902 0.50 0.0906844 
1.75 0.0876610 0.139625 0.75 0.0172174 
2.00 0.117247 0.214785 1.00 0.118478 


For Part (d), error bound formula (5.10) cannot be applied since L = 0. 


5. Euler’s method gives the approximations in the following tables. 


a. i tj Wi y(t;) b. 1 tj Wi y(t) 
2 1.200 1.0082645 1.0149523 2 1.400 0.4388889 0.4896817 
4 1.400 1.0385147 1.0475339 4 1.800 1.0520380 1.1994386 
6 1.600 1.0784611 1.0884327 6 2.200 1.8842608 2.2135018 
8 1.800 1.1232621 1.1336536 8 2.600 3.0028372 3.6784753 
10 2.000 1.1706516 1.1812322 10 3.000 4.5142774 5.8741000 
Cc. 1 tj Wi y(ti) d. i tj Wi y(ti) 
2 0.400 —1.6080000 —1.6200510 2 0.2 0.1083333 0.1626265 
4 0.800 —1.3017370 —1.3359632 4 0.4 0.1620833 0.2051118 
6 1.200 —1.1274909 —1.1663454 6 0.6 0.3455208 0.3765957 
8 1.600 —1.0491191 —1.0783314 8 0.8 0.6213802 0.6461052 
10 2.000 —1.0181518 —1.0359724 10 1.0 0.980345 1 1.0022460 
7. The actual errors for the approximations in Exercise 3 are in the following tables. 
at Actual Error b. ¢ Actual Error c ¢ Actual Error d. ¢ Actual Error 
1.2 0.0066879 1.4 0.0507928 0.4 0.0120510 0.2 0.0542931 
1.5 0.0095942 2.0  0.2240306 1.0 0.0391546 0.5 0.0363200 
1.7 0.0102229 2.4 0.4742818 1.4 0.0349030 0.7 0.0273054 


2.0 0.0105806 3.0 1.3598226 2.0  0.0178206 1.0  0.0219009 
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9. Euler’s method gives the approximations in the following table. 


ai ti Wi y(ti) 
1 1.1 0.271828 0.345920 
5 1.5 3.18744 3.96767 
6 1.6 4.62080 5.70296 
9 1.9 11.7480 14.3231 
10 2.0 15.3982 18.6831 


b. Linear interpolation gives the approximations in the following table. 


t Approximation y(t) Error 
1.04 0.108731 0.119986 0.01126 
1.55 3.90412 4.78864 0.8845 
1.97 14.3031 17.2793 2.976 


c. h < 0.00064 
11. a. Euler’s method produces the following approximation to y(5) = 5.00674. 
| h=0.2 | h=0.1 | h = 0.05 


wy | 5.00377 | 5.00515 | 5.00592 


b. h = /2 x 10-6 © 0.0014142. 


13. a. 1.021957 = y(1.25) © 1.014978, 1.164390 = y(1.93) © 1.153902 

b. 1.924962 = y(2.1) © 1.660756, 4.394170 = y(2.75) © 3.526160 

ce. —1.138277 = y(1.3) © —1.103618, —1.041267 = y(1.93) + —1.022283 
d. 0.3140018 = y(0.54) © 0.2828333, 0.8866318 = (0.94) ¥ 0.8665521 
a. h = 107"? 

b. 


. The minimal error is 10~”/?(e — 1) + 5e107""!. 


15. 


Cc. Error 
t w(h = 0.1) w(h = 0.01) y(t) (n = 8) 
0.5 0.40951 0.39499 0.39347 1.5 x 107+ 
1.0 0.65132 0.63397 0.63212 3.1 x 10-4 


17. b. ws = 0.10430 © p(50) 
c. Since p(t) = 1 — 0.99e-°", p(50) = 0.10421. 


Exercise Set 5.3 (Page 281) 


loa fj Wj y(t) b. fj Wi y(t) 
0.50 0.12500000 0.28361652 2.50 1.75000000 1.83333333 
1.00 2.02323897 3.21909932 3.00 2.42578125 2.50000000 

Cc fj Wi y(ti) d. ti Wi y(ti) 
1.25 2.78125000 2.77892944 0.25 1.34375000 1.32914981 
1.50 3.61250000 3.60819766 0.50 1.77218707 1.73048976 
1.75 4.48541667 4.47932763 0.75 2.11067606 2.04147203 
2.00 5.39404762 5.38629436 1.00 2.20164395 2.11797955 
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3. a. Wj y(t) ee Wj yt) 
0.50 0.25781250 0.28361652 2.50 1.81250000 1.83333333 
1.00 3.05529474 3.21909932 3.00 2.4859 1644 2.50000000 
Cc. tj Wi y(ti) d. tj Wi y(ti) 
1.25 2.77897135 2.77892944 0.25 1.32893880 1.32914981 
1.50 3.60826562 3.60819766 0.50 1.72966730 1.73048976 
1.75 4.47941561 4.47932763 0.75 2.03993417 2.04147203 
2.00 5.38639966 5.38629436 1.00 2.11598847 2.11797955 
5. a. Order 2 b. Order 2 
i tj Wi y(t) i tj Wi y(t) 
1 1.1 1.214999 1.215886 1 0.5 0.5000000 0.5158868 
1.2 1.465250 1.467570 2; 1.0 1.076858 1.091818 
Cc. Order 2 d. Order 2 
i tj Wi y(t) i tj Wi yt) 
1 1.5 —2.000000 —1.500000 1 0.25 1.093750 1.087088 
2 2.0 —1.777776 — 1.333333 2 0.50 1.312319 1.289805 
3 25 —1.585732 —1.250000 3 0.75 1.538468 1.513490 
4 3.0 — 1.458882 — 1.200000 4 1.0 1.720480 1.701870 
7. a. Order 4 b. Order 4 
i tj Wi y(t) i tj Wi y(t) 
1 1.1 1.215883 1.215886 1 0.5 0.5156250 0.5158868 
1.2 1.467561 1.467570 2 1.0 1.091267 1.091818 
Cc. Order 4 d. Order 4 
i tj Wi y(t) i tj Wi y(t) 
1 1.5 —2.000000 —1.500000 1 0.25 1.086426 1.087088 
2 2.0 —1.679012 — 1.333333 2 0.50 1.288245 1.289805 
3 2:3 — 1.484493 —1.250000 3 0.75 1.512576 1.513490 
4 3.0 —1.374440 — 1.200000 4 1.0 1.701494 1.701870 


9. a. Taylor’s method of order two gives the results in the following table. 


i tj Wi yt) 
1 1.1 0.3397852 0.3459199 
5 1.5 3.910985 3.967666 
6 1.6 5.643081 5.720962 
9 1.9 14.15268 14.32308 
10 2.0 18.46999 18.68310 


b. Linear interpolation gives y(1.04) © 0.1359139, y(1.55) ~ 4.777033, and y(1.97) © 17.17480. Actual values are 
y(1.04) = 0.1199875, y(1.55) = 4.788635, and y(1.97) = 17.27930. 
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c. Taylor’s method of order four gives the results in the following table. 


i tj Wi 

1 1.1 0.3459127 
B) 1.5 3.967603 
6 1.6 5.720875 
9 1.9 14.32290 
10 2.0 18.68287 


d. Cubic Hermite interpolation gives y(1.04) ~ 0.1199704, y(1.55) * 4.788527, and y(1.97) © 17.27904. 


11. a. 


i t Order 2 Order 4 
2 0.2 5.86595 5.86433 
5 0.5 2.82145 2.81789 
7 0.7 0.84926 0.84455 
10 1.0 —2.08606 —2.09015 
0.8 s 


Exercise Set 5.4 (Page 291) 
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t Modified Euler y(t) 
0.5 0.5602111 0.2836165 
1.0 5.3014898 3.2190993 

t Modified Euler y(t) 
1.25 2.7750000 2.7789294 
1.50 3.6008333 3.6081977 
1.75 4.4688294 4.4793276 
2.00 5.3728586 5.3862944 


Modified Euler 


tj Wi y(t) 

1.2 1.0147137 1.0149523 
1.5 1.0669093 1.0672624 
1.7 1.1102751 1.1106551 
2.0 1.1808345 1.1812322 


Modified Euler 


ti Wi y(ti) 
0.4 —1.6229206 —1.6200510 
1.0 —1.2442903 —1.2384058 
1.4 —1.1200763 —1.1146484 
2.0 —1.0391938 —1.0359724 


t Modified Euler y(t) 
2.5 1.8125000 1.8333333 
3.0 2.4815531 2.5000000 
t Modified Euler y(t) 
0.25 1.3199027 1.3291498 
0.50 1.7070300 1.7304898 
0.75 2.0053560 2.0414720 
1.00 2.0770789 2.1179795 


Modified Euler 


tj Wi y(ti) 
1.4 0.4850495 0.4896817 
2.0 1.6384229 1.6612818 
2.4 2.825065 1 2.8765514 
3.0 5.7075699 5.8741000 


Modified Euler 


ti Wi y(t) 
0.2 0.1742708 0.1626265 
0.5 0.2878200 0.2773617 
0.7 0.5088359 0.5000658 
1.0 1.0096377 1.0022460 
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5. a. ¢ Midpoint y(t) b. ¢ Midpoint y(t) 
0.5 0.2646250 0.2836165 2.5 1.7812500 1.8333333 
1.0 3.1300023 3.2190993 3.0 2.4550638 2.5000000 
ec ¢ Midpoint y(t) d. ¢ Midpoint y(t) 
1225 2.77777718 2.7789294 0.25 1.3337962 1.3291498 
1.50 3.6060606 3.6081977 0.50 1.7422854 1.7304898 
1.75 4.4763015 4.4793276 0.75 2.0596374 2.0414720 
2.00 5.3824398 5.3862944 1.00 2.1385560 2.1179795 
7. a. Midpoint b. Midpoint 
tj Wi yi) tj Wj yi) 
1.2 1.0153257 1.0149523 1.4 0.4861770 0.4896817 
1.5 1.0677427 1.0672624 2.0 1.6438889 1.6612818 
1.7 1.1111478 1.1106551 2.4 2.8364357 2.8765514 
2.0 1.1817275 1.1812322 3.0 5.7386475 5.8741000 
c. Midpoint d. Midpoint 
tj Wi yi) tj Wj yi) 
0.4 —1.6192966 —1.6200510 0.2 0.1722396 0.1626265 
1.0 —1.2402470 —1.2384058 0.5 0.2848046 0.2773617 
1.4 —1.1175165 —1.1146484 0.7 0.5056268 0.5000658 
2.0 —1.0382227 —1.0359724 1.0 1.0063347 1.0022460 
9. a. Heun b. Heun 
tj Wi y(t) tj Wj y(t) 
0.50 0.2710885 0.2836165 2.50 1.8464828 1.8333333 
1.00 3.1327255 3.2190993 3.00 2.5094123 2.5000000 
Cc. Heun d. Heun 
tj Wi yi) tj Wj y(t) 
1.25 2.7788462 2.7789294 0.25 1.3295717 1.3291498 
1.50 3.6080529 3.6081977 0.50 1.7310350 1.7304898 
1.75 4.4791319 4.4793276 0.75 2.0417476 2.0414720 
2.00 5.3860533 5.3862944 1.00 2.1176975 2.1179795 
11. a. Heun b. Heun 
tj Wi yG) tj Wi yi) 
1.2 1.0149305 1.0149523 1.4 0.4895074 0.4896817 
1.7 1.1106289 1.1106551 2.4 2.8741491 2.8765514 


2.0 1.1812064 1.1812322 3.0 5.8652189 5.8741000 
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c. Heun d. Heun 
Uj Wi yi) Uj Wi yi) 
0.4 —1.6201023 —1.6200510 0.2 0.1614497 0.1626265 
1.0 —1.2383500 —1.2384058 0.5 0.2765 100 0.2773617 
1.4 —1.1144745 —1.1146484 0.7 0.4994538 0.5000658 
2.0 —1.0357989 —1.0359724 1.0 1.0018114 1.0022460 
13. a. Runge-Kutta b. Runge-Kutta 
fj Wj y(t) fj Wi y(t) 
0.5 0.2969975 0.2836165 2.5 1.8333234 1.8333333 
1.0 3.3143118 3.2190993 3.0 2.4999712 2.5000000 
c. Runge-Kutta d. Runge-Kutta 
tj Wi y(t) tj Wi yi) 
1.25 2.7789095 2.7789294 0.25 1.3291650 1.3291498 
1.50 3.608 1647 3.6081977 0.50 1.7305336 1.7304898 
1.75 4.4792846 4.4793276 0.75 2.0415436 2.0414720 
2.00 5.3862426 5.3862944 1.00 2.1180636 2.1179795 
15. a. Runge-Kutta b. Runge-Kutta 
Uj Wi y(ti) tj Wi y(ti) 
1.2 1.0149520 1.0149523 1.4 0.4896842 0.4896817 
1.5 1.0672620 1.0672624 2.0 1.6612651 1.6612818 
1.7 1.1106547 1.1106551 2.4 2.8764941 2.87655 14 
2.0 1.1812319 1.1812322 3.0 5.8738386 5.8741000 
c. Runge-Kutta d. Runge-Kutta 
Uj Wi y(t) Uj Wi y(ti) 
0.4 —1.6200576 —1.6200510 0.2 0.1627655 0.1626265 
1.0 —1.2384307 —1.2384058 0.5 0.2774767 0.2773617 
1.4 —1.1146769 —1.1146484 0.7 0.5001579 0.5000658 
2.0 —1.0359922 —1.0359724 1.0 1.0023207 1.0022460 
17. a. 1.0221167 © y(1.25) = 1.0219569, 1.1640347 ~ y(1.93) = 1.1643901 
b. 1.9086500 © y(2.1) = 1.9249616, 4.3105913 © y(2.75) = 4.3941697 
ce. —1.1461434 © y(1.3) = —1.1382768, —1.0454854 = y(1.93) = —1.0412665 
d. 0.3271470 © (0.54) = 0.3140018, 0.8967073 ~ (0.94) = 0.8866318 
19. a. 1.0227863 © y(1.25) = 1.0219569, 1.1649247 ~ y(1.93) = 1.1643901 
b. 1.9153749 © y(2.1) = 1.9249616, 4.3312939 © y(2.75) = 4.3941697 
ec. —1.1432070 © y(1.3) = —1.1382768, —1.0443743 ~ y(1.93) = —1.0412665 
d. 0.3240839 ~ (0.54) = 0.3140018, 0.8934152 ~ (0.94) = 0.8866318 
21. a. 1.02235985 © y(1.25) = 1.0219569, 1.16440371 ~ y(1.93) = 1.1643901 
b. 1.88084805 © y(2.1) = 1.9249616, 4.40842612 © y(2.75) = 4.3941697 
c. —1.14034696 ~ y(1.3) = —1.1382768, —1.04182026 ~ y(1.93) = —1.0412665 
d. 0.31625699 ~ (0.54) = 0.3140018, 0.88866134 ~ y(0.94) = 0.8866318 
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- 1.0223826 © y(1.25) = 1.0219569, 1.1644292 ~ y(1.93) = 1.1643901 

. 1.9373672 © y(2.1) = 1.9249616, 4.4134745 © y(2.75) = 4.3941697 

. —1.1405252 © y(1.3) = —1.1382768, —1.0420211 © y(1.93) = —1.0412665 
- 0.31716526 © y(0.54) = 0.3140018, 0.88919730 ~ y(0.94) = 0.8866318 

. 1.0219569 = y(1.25) © 1.0219550, 1.1643902 = y(1.93) © 1.1643898 

. 1.9249617 = (2.10) © 1.9249217, 4.3941697 = y(2.75) © 4.3939943 

. —1.138268 = y(1.3) © —1.1383036, —1.0412666 = y(1.93) + —1.0412862 

. 0.31400184 = y(0.54) © 0.31410579, 0.88663176 = y(0.94) ~ 0.88670653 
27. With f(t,y) = —y+t+ 1, we have both 


25. 


ae mtpenernenr 


+h are gee (ti, Wi) 1 ea +t [h a +h 
Wi iT FZ Wit ZIV, Wi) | = Wi = a i ar 
f a _ 2 2 


and 
h I? he 
wit 5 [Ali wi) + FG wit hf wi)] = wi Lee +4; = +h, 


because f(f,y) is linear in both variables. 


29. In 0.2 s we have approximately 2099 units of KOH. 


Exercise Set 5.5 (Page 300) 


1. The Runge-Kutta-Fehlberg Algorithm gives the results in the following tables. 


ai t Wi h; Yi 
1 0.2093900 0.0298 184 0.2093900 0.0298337 
3 0.5610469 0.4016438 0.1777496 0.4016860 
5 0.8387744 1.5894061 0.1280905 1.5894600 
7 1.0000000 3.2190497 0.0486737 3.2190993, 


b. i tj Wi h; Ji 


1 2.2500000 1.4499988 0.2500000 1.4500000 
2 2.5000000 1.8333332 0.2500000 1.8333333 
3 2.7500000 2.1785718 0.2500000 2.1785714 
4 3.0000000 2.5000005 0.2500000 2.5000000 


a 7 tj Wi h; di 


1 1.2500000 2.7789299 0.2500000 2.7789294 
2 1.5000000 3.6081985 0.2500000 3.6081977 
3 1.7500000 4.4793288 0.2500000 4.4793276 
4 2.0000000 5.3862958 0.2500000 5.3862944 


d. I tj Wi hj Ji 


1 0.2500000 1.3291478 0.2500000 1.3291498 
2 0.5000000 1.7304857 0.2500000 1.7304898 
3 0.7500000 2.0414669 0.2500000 2.0414720 
4 1.0000000 2.1179750 0.2500000 2.1179795 
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3. The Runge-Kutta-Fehlberg Algorithm gives the results in the following tables. 


a. i ti Wi h; Yi 
1 1.1101946 1.005 1237 0.1101946 1.0051237 
5 1.7470584 1.1213948 0.2180472 1.1213947 
7 2.3994350 1.2795396 0.3707934 1.2795395 
11 4.0000000 1.6762393 0.1014853 1.6762391 
b. i tj W; hy Ji 
4 1.5482238 0.7234123 0.1256486 0.7234119 
7 1.8847226 1.3851234 0.1073571 1.3851226 
10 2.1846024 2.1673514 0.0965027 2.1673499 
16 2.6972462 4.1297939 0.0778628 4.1297904 
21 3.0000000 5.8741059 0.0195070 5.8741000 
ci ti Wi h; Yi 
1 0.1633541 —1.8380836 0.1633541 —1.8380836 
3 0.7585763 —1.3597623 0.1266248 —1.3597624 
9 1.1930325 —1.1684827 0.1048224 —1.1684830 
13 1.6229351 —1.0749509 0.1107510 —1.0749511 
17 2.1074733 —1.0291158 0.1288897 —1.0291161 
23 3.0000000 —1.0049450 0.1264618 —1.0049452 
d. I tj Wi h; Ji 
1 0.398605 1 0.3108201 0.398605 1 0.3108199 
3 0.9703970 0.2221189 0.2866710 0.2221186 
5 1.5672905 0.1133085 0.3042087 0.1133082 
8 2.0000000 0.0543454 0.0902302 0.0543455 


5. a. The number of infectives is y(30) + 80295.7. 


b. The limiting value for the number of infectives for this model is lim,_,., y(t) = 100,000. 


Exercise Set 5.6 (Page 314) 


1. The Adams-Bashforth methods give the results in the following tables. 


Answers for Selected Exercises 


a. ¢ 2-step 3-step 4-step 5-step y(t) 
0.2 0.0268128 0.0268128 0.0268128 0.0268128 0.0268 128 
0.4 0.1200522 0.1507778 0.1507778 0.1507778 0.1507778 
0.6 0.4153551 0.4613866 0.4960196 0.4960196 0.4960196 
0.8 1.1462844 1.2512447 1.2961260 1.3308570 1.3308570 
1.0 2.8241683 3.0360680 3.1461400 3.1854002 3.2190993 

b. ¢ 2-step 3-step 4-step 5-step y(t) 
2.2 1.3666667 1.3666667 1.3666667 1.3666667 1.3666667 
2.4 1.6750000 1.6857143 1.6857143 1.6857143 1.6857143 
2.6 1.9632431 1.9794407 1.9750000 1.9750000 1.9750000 
2.8 2.2323184 2.2488759 2.2423065 2.2444444 2.2444444 
3.0 2.48845 12 2.505 1340 2.4980306 2.5011406 2.5000000 
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c ¢ 2-step 3-step 4-step 5-step y(t) 
1.2 2.6187859 2.6187859 2.6187859 2.6187859 2.6187859 
1.4 3.2734823 3.2710611 3.2710611 3.2710611 3.2710611 
1.6 3.9567107 3.951423] 3.9520058 3.9520058 3.9520058 


1.8 4.6647738 4.6569191 4.6582078 4.6580160 4.6580160 
2.0 5.3949416 5.3848058 5.3866452 5.3862177 5.3862944 


d. ¢ 2-step 3-step 4-step 5-step y(t) 
0.2 1.2529306 1.2529306 1.2529306 1.2529306 1.2529306 
0.4 1.5986417 1.5712255 1.5712255 1.5712255 1.5712255 
0.6 1.9386951 1.8827238 1.8750869 1.8750869 1.8750869 
0.8 2.1766821 2.0844122 2.0698063 2.0789180 2.0789180 


1.0 2.2369407 2.1115540 2.0998117 2.1180642 2.1179795 


3. The Adams-Bashforth methods give the results in the following tables. 


a. t 2-step 3-step 4-step 5-step y(t) 
1.2 1.0161982 1.0149520 1.0149520 1.0149520 1.0149523 
1.4 1.0497665 1.0468730 1.0477278 1.0475336 1.0475339 
1.6 1.0910204 1.0875837 1.0887567 1.0883045 1.0884327 
1.8 1.1363845 1.1327465 1.1340093 1.1334967 1.1336536 
2.0 1.1840272 1.1803057 1.1815967 1.1810689 1.1812322 
b. ¢ 2-step 3-step 4-step 5-step y(t) 
1.4 0.4867550 0.4896842 0.4896842 0.4896842 0.4896817 
1.8 1.1856931 1.1982110 1.1990422 1.1994320 1.1994386 
2.2 2.1753785 2.2079987 2.2117448 2.2134792 2.2135018 
2.6 3.5849181 3.6617484 3.6733266 3.6777236 3.6784753 
3.0 5.6491203 5.8268008 5.8589944 5.8706101 5.8741000 
c ¢ 2-step 3-step 4-step 5-step y(t) 
0.5 —1.5357010 —1.5381988 —1.5379372 —1.5378676 —1.5378828 
1.0 —1.2374093 —1.2389605 —1.2383734 —1.2383693 —1.2384058 
1.5 —1.0952910 —1.0950952 —1.0947925 —1.0948481 —1.0948517 
2.0 —1.0366643 —1.0359996 —1.0359497 —1.0359760 —1.0359724 
d. ¢ 2-step 3-step 4-step 5-step y(t) 
0.2 0.1739041 0.1627655 0.1627655 0.1627655 0.1626265 


0.4 0.2144877 0.2026399 0.2066057 0.2052405 0.2051118 
0.6 0.3822803 0.3747011 0.3787680 0.3765206 0.3765957 
0.8 0.6491272 0.6452640 0.6487176 0.6471458 0.6461052 
1.0 1.0037415 1.0020894 1.0064121 1.0073348 1.0022460 
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5a. fj Wi; y(t) 
0.2 0.0269059 0.0268 128 
0.4 0.1510468 0.1507778 
0.6 0.4966479 0.4960196 
0.8 1.3408657 1.3308570 
1.0 3.245088 1 3.2190993 

Ct Wi y(ti) 
1.2 2.6187787 2.6187859 
1.4 3.2710491 3.2710611 
1.6 3.9519900 3.9520058 
1.8 4.6579968 4.6580160 
2.0 5.3862715 5.3862944 


Answers for Selected Exercises 


7. The Adams Fourth-order Predictor-Corrector Algorithm gives the results in the following tables. 


a. t w y@) 
1.2 1.0149520 1.0149523 
1.4 1.0475227 1.0475339 
1.6 1.0884141 1.0884327 
1.8 1.1336331 1.1336536 
2.0 1.1812112 1.1812322 
ct w y) 
0.5 —1.5378788 —1.5378828 
1.0 —1.2384134 —1.2384058 
1.5 —1.0948609 —1.0948517 
2.0 —1.0359757 —1.0359724 


tj Wi yu) 
2.2 1.3666610 1.3666667 
2.4 1.6857079 1.6857143 
2.6 1.9749941 1.9750000 
2.8 2.2446995 2.2444444 
3.0 2.5003083 2.5000000 
ti Wi y(ti) 
0.2 1.2529350 1.2529306 
0.4 1.5712383 1.5712255 
0.6 1.8751097 1.8750869 
0.8 2.0796618 2.0789180 
1.0 2.1192575 2.1179795 
t w y(t) 
1.4 0.4896842 0.48968 17 
1.8 1.1994245 1.1994386 
22, 2.2134701 2.2135018 
2.6 3.6784144 3.6784753 
3.0 5.8739518 5.8741000 
t w y(t) 
0.2 0.1627655 0.1626265 
0.4 0.2048557 0.2051118 
0.6 0.3762804 0.3765957 
0.8 0.6458949 0.6461052 
1.0 1.0021372 1.0022460 


9. a. With h = 0.01, the three-step Adams-Moulton method gives the values in the following table. 


i tj 
10 0.1 1.317218 
20 0.2 1.784511 


b. Newton’s method will reduce the number of iterations per step from three to two, using the stopping criterion 


15. To derive Milne’s method, integrate y'(t) = f(t, y(t)) on the interval [t;_3, t;,;] to obtain 


Yin) — YUi-3) = 


je — wy) = 10. 


Ti+] 


f(t,y@) dt. 


Using the open Newton-Cotes formula (4.31) on page 201, we have 


Yin) — YUi-3) = 


4h[2 f (ti, Gi) — fi-1, 9G-1)) + 2 f G2, »G-2))] + 14h’ f &, y)) 


The difference equation becomes 
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with local truncation error 


14hty &) 


Ti41(h) = 45 


Exercise Set 5.7 (Page 320) 


1. The Adams Variable Step-Size Predictor-Corrector Algorithm gives the results in the following tables. 


ai tj Wi h; Vi 


1 0.04275596 0.0009689 1 0.04275596 0.00096887 
5 0.22491460 0.03529441 0.05389076 0.03529359 
12 0.60214994 0.50174348 0.05389076 0.50171761 


17 0.81943926 1.45544317 0.04345786 1.45541453 
22 0.99830392 3.19605697 0.03577293 3.19602842 
26 1.00000000 3.21912776 0.00042395 3.21909932 
b i tj Wi h; Ji 
1 2.06250000 1.12132350 0.06250000 1.12132353 
5 2.31250000 1.55059834 0.06250000 1.55059524 
9 2.62471924 2.00923157 0.09360962 2.00922829 
13 2.99915773 2.49895243 0.09360962 2.49894707 
17 3.00000000 2.50000535 0.0002 1057 2.50000000 
ce i ti W; h; Ji 
1 1.06250000 2.18941363 0.06250000 2.18941366 
4 1.25000000 2.7789293 1 0.06250000 2.77892944 
8 1.85102559 4.84179835 0.15025640 4.84180141 
12 2.00000000 5.38629105 0.03724360 5.38629436 
d. 1 tj Wi h; Ji 
1 0.06250000 1.06817960 0.06250000 1.06817960 
5 0.31250000 1.42861668 0.06250000 1.42861361 
10 0.62500000 1.90768386 0.06250000 1.90767015 
13 0.81250000 2.08668486 0.06250000 2.08666541 
16 1.00000000 2.11800208 0.06250000 2.11797955 


3. The following tables list representative results from the Adams Variable Step-Size Predictor-Corrector Algorithm. 


aii tj Wi h; Ji 
5 1.10431651 1.00463041 0.02086330 1.00463045 
15 1.31294952 1.03196889 0.02086330 1.03196898 
25 1.59408 142 1.08714711 0.03122028 1.08714722 
35 2.00846205 1.18327922 0.04824992 1.18327937 
45 2.66272188 1.34525123 0.07278716 1.34525143 
52 3.40193112 1.52940900 0.11107035 1.52940924 


57 4.00000000 1.67623887 0.12174963 1.67623914 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


b. i tj Wi h; Yi 
") 1.18519603 0.20333499 0.03703921 0.20333497 
15 1.55558810 0.73586642 0.03703921 0.73586631 
25 1.92598016 1.48072467 0.03703921 1.48072442 
35 2.29637222 2.51764797 0.03703921 2.51764743 
45 2.65452689 3.92602442 0.03092051 3.92602332 
55 2.94341188 5.50206466 0.02584049 5.50206279 
61 3.00000000 5.87410206 0.00122679 5.87409998 
(es ti Wi h; Ji 
Pe) 0.16854008 —1.83303780 0.03370802 —1.83303783 
17 0.64833341 —1.42945306 0.05253230 —1.42945304 
27 1.06742915 —1.21150951 0.04190957 —1.21150932 
41 1.75380240 —1.05819340 0.0668 1937 —1.05819325 
51 2.50124702 —1.01335240 0.07474446 —1.01335258 
61 3.00000000 —1.00494507 0.01257155 —1.00494525 
d. i tj W; h; Yj 
> 0.28548652 0.32153668 0.05709730 0.32153674 
15 0.85645955 0.2428 1066 0.05709730 0.2428 1095 
20 1.35101725 0.15096743 0.09891154 0.15096772 
25 1.66282314 0.09815109 0.06236118 0.098 15137 
29 1.91226786 0.06418555 0.06236118 0.06418579 
33 2.00000000 0.05434530 0.02193303 0.05434551 
5. The current after 2 seconds is approximately i(2) = 8.693 amperes. 
Exercise Set 5.8 (Page 327) 
1. The Extrapolation Algorithm gives the results in the following tables. 
a. 1 tj h k 
1 0.25 0.04543 132 0.25 3 0.04543 123 
2 0.50 0.28361684 0.25 3 0.28361652 
3 0.75 1.05257634 0.25 4 1.05257615 
4 1.00 3.21909944 0.25 4 3.21909932 
b. Ll ti h k 
1 2.25 1.44999987 0.25 3 1.45000000 
2 2.50 1.83333321 0.25 3 1.83333333 
3 2.75 2.17857133 0.25 3 2.17857143 
4 3.00 2.49999993 0.25 3 2.50000000 
c 1 tj h k 
1 1.25 2.77892942 0.25 3 2.77892944 
2 1.50 3.608 19763 0.25 3 3.608 19766 
3 1.75 4.47932759 0.25 3 4.47932763 
4 2.00 5.3862943 1 0.25 3 5.38629436 
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RwWNe 


Wi h 
1.32914981 0.25 
1.73048976 0.25 
2.04147203 0.25 
2.11797954 0.25 


WW WwW Ww ~~ 


Ji 


1.32914981 
1.73048976 
2.04147203 
2.11797955 


al t; Wi h k Ji 
1 1.50 1.06726237 0.50 4 1.06726235 
2 2.00 1.18123223 0.50 3 1.18123222 
3 2.50 1.30460372 0.50 3 1.30460371 
4 3.00 1.42951608 0.50 3 1.42951607 
5 3.50 1.55364771 0.50 3 1.55364770 
6 4.00 1.67623915 0.50 3 1.67623914 
b. i tj Wi h k Yi 
1 1.50 0.64387537 0.50 4 0.64387533 
2 2.00 1.66128182 0.50 5 1.66128176 
3 2.50 3.25801550 0.50 5 3.25801536 
4 3.00 5.87410027 0.50 5 5.87409998 
Cc. 1 ti Wi h k Vi 
1 0.50 —1.53788284 0.50 4 —1.53788284 
2 1.00 —1.23840584 0.50 5 —1.23840584 
3 1.50 —1.09485175 0.50 5 —1.09485175 
4 2.00 —1.03597242 0.50 5 —1.03597242 
5 2.50 —1.01338570 0.50 5 —1.01338570 
6 3.00 —1.00494526 0.50 4 —1.00494525 
d. i tj Wi h k Ji 
1 0.50 0.29875177 0.50 4 0.29875178 
2 1.00 0.21662642 0.50 4 0.21662642 
3 1.50 0.12458565 0.50 4 0.12458565 
4 2.00 0.05434552 0.50 4 0.05434551 


Exercise Set 5.9 (Page 337) 


1. The Runge-Kutta for Systems Algorithm gives the results in the following tables. 


a. tj Wii Uj W2i U2; 
0.200 2.12036583 2.12500839 1.50699185 1.51158743 
0.400 4.44122776 4.46511961 3.24224021 3.26598528 
0.600 9.73913329 9.83235869 8.16341700 8.25629549 
0.800 22.67655977 23.00263945 21.34352778 21.66887674 
1.000 55.66118088 56.73748265 56.03050296 57.10536209 
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b. tj Wij Uy; Wi U2; 
0.500 0.95671390 0.95672798 —1.08381950 —1.08383310 
1.000 1.30654440 1.30655930 —0.83295364 —0.83296776 
1.500 1.34416716 1.34418117 —0.56980329 —0.5698 1634 
2.000 1.14332436 1.14333672 —0.36936318 —0.36937457 

cf; Wii Uy; Wi U2; W3; U3; 

0.5 0.70787076 0.70828683 —1.24988663 —1.25056425 0.39884862 0.39815702 
1.0  —0.33691753 —0.33650854 —3.01764179 —3.01945051 —0.29932294 —0.30116868 
1.5  —2.41332734 —2.41345688 —5.40523279 —5.40844686 —0.92346873 —0.92675778 
2.0  —5.89479008 —5.89590551  —8.70970537 —8.71450036 —1.32051165 —1.32544426 
d. ij Wii Uy; Wi Up; W3i U3; 
0.2 1.38165297 = 1.38165325 —-1.00800000 ~=—1.00800000 + —0.61833075 —0.61833075 
0.5 1.90753116  1.90753184  1.12500000 =1.12500000 —0.09090565 —0.09090566 
0.7 = 2.25503524  2.25503620 = 1.34300000 —_—_1.34000000 0.26343971 0.26343970 
1.0 2.83211921 2.83212056  2.00000000 2.00000000 0.88212058 0.88212056 
3. The Runge-Kutta for Systems Algorithm gives the results in the following tables. 

a. tj Wii Ji b. tj Wii Ji 
0.200 0.00015352 0.00015350 1.200 0.96152437 0.96152583 
0.500 0.00742968 0.00743027 1.500 0.77796897 0.77797237 
0.700 0.03299617 0.03299805 1.700 0.59373369 0.59373830 
1.000 0.17132224 0.17132880 2.000 0.27258237 0.27258872 

c. 1 Wii Ji d. Ui Wii Wi 
1.000 3.73162695 3.73170445 1.200 0.27273759 0.27273791 
2.000 11.31424573 11.31452924 1.500 1.08849079 1.08849259 
3.000 34.04395688 34.04517155 1.700 2.04353207 2.04353642 

2.000 4.36156675 4.36157780 
5. To approximate the solution of the mth—order system of first-order initial-value problems 
Uj = fi(t,Uj,U2,..-,Um), J=1,2,...,m, for a<t<b, ua) =a), j=1,2,...,m 


at (n + 1) equally spaced numbers in the interval [a, b]; 


INPUT endpoints a,b; number of equations m; integer N; initial conditions a,. 


OUTPUT approximations w,,; to u;(t;). 


Step 1 
Step 2 
Step 3 
Step 4 
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Set h = (b—a)/N; 


For j = 1,2,...,m set woj = aj. 


OUTPUT (%, Wo1, Wo2,-- 


a Won): 


For i= 1,2,3 do Steps 5-11. 


Step 5 


Step 6 


For j = 1,2,...,m set 
kj =hffG-1, wi-is,-- 
For j = 1,2,...,m set 


9 Wi-1m)- 


++ QAm. 


I 1 1 1 
kaj =hf; (t-1 +5, Win + aki, Wine + xki2,-++, Wit + skim) . 
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Step 12 


Step 16 OUTPUT (;, w;1, wi2,-- 


Step 7 Forj =1,2,...,m set 

kaj = Afi (ar + 3, wi-ta + $ho, Wi-a + $ho2.---s Wi-tm + Fkom) - 
Step 8 Forj =1,2,...,m set 

kaj = hf +A, wii + kai, Wi-2 + kaa. +s Wi-tm + kam). 
Step 9 Forj =1,2,...,m set 


Wij = Wi-1y + (ky + 2koj + 2k3j + k4j)/6. 
Step 10 Set t; =a+ih. 
Step 11 OUTPUT (t;, w;1, wiz... 
Fori=4 ,...,N do Steps 13-16. 
Step 13 Set tj; = a+ ih. 
Step 14 For j = 1,2,...,m set 


9 Wim). 


wO =wi1j + ALS fi(tin-n, Wi-a ss «++ Wi-tm) — SOF (ti-2, Wi-2,15 


ij = 


sey Wi-2.m) 


+ 37 fj (tj-3, Wi-3,15 ++ +5 Wi-3ym) — OF (bias Wi-4s ++ Wi-4yn)]/24- 


Step 15 For j = 1,2,...,m set 


0) 0 
Wij =Wi-1y + nf; (x, wit ee win) + 19 fi(ti-1, Wi-115--- 


> Wi-1m) 


= 5 filtiny Winns +5 Wi-2m) + filti-as Wi-ats +++ Wi-am) | /24. 


9 Wim)- 


Step 17 STOP 


7. The Adams fourth-order predictor-corrector method for systems applied to the problems in Exercise 1 gives the results in the 
following tables. 


a fj Wii Wi Wi U2; 
0.200 2.12036583 2.12500839 1.50699185 1.51158743 
0.400 4.44122776 4.46511961 3.24224021 3.26598528 
0.600 9.73913329 9.83235869 8.16341700 8.25629549 
0.800 22.52673210 23.00263945 21.20273983 21.66887674 
1.000 54.81242211 56.73748265 55.20490157 57.10536209 
bt Wii Uj Wi U2; 
0.500 0.95675505 0.95672798 —1.08385916 —1.08383310 
1.000 1.30659995 1.30655930 —0.83300571 —0.83296776 
1.500 1.34420613 1.34418117 —0.56983853 —0.56981634 
2.000 1.14334795 1.14333672 —0.36938396 —0.36937457 
c fj Wii Ui Wi U2; W3i U3; 
0.5 0.70787076 0.70828683 —1.24988663 —1.25056425 0.39884862 0.39815702 
1.0 —0.33691753 —0.33650854 —3.01764179 —3.01945051 —0.29932294 —0.30116868 
1.5 —2.41332734 —2.41345688 —5.40523279 —5.40844686 —0.92346873 —0.92675778 
2.0 —5.88968402 —5.89590551 —8.72213325 —8.71450036 —1.32972524 —1.32544426 
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d. fj Wii Uy; Wi Ug; W3i U3; 
0.2 1.38165297 1.38165325 1.00800000 1.00800000 —0.61833075 —0.61833075 
0.5 1.90752882 1.90753184 1.12500000 1.12500000 —0.09090527 —0.09090566 
0.7 2.25503040 2.25503620 1.34300000 1.34300000 0.26344040 0.26343970 
1.0 2.83211032 2.83212056 2.00000000 2.00000000 0.88212163 0.88212056 


9. The predicted number of prey, x;;, and predators, x2;, are given in the following table. 


i t; Xj x2} 
10 1.0 4393 1512 
20 2.0 288 3175 
30 3.0 32 2042 
40 4.0 25 1258 


Exercise Set 5.10 (Page 347) 
1. Let L be the Lipschitz constant for ¢. Then 
Uipt — Vipr = Uj — YU + ALO, uh) — OG, YA), 
so 
wigs — Vigil SL + AL)|u; — vi] < (+ AL)'* [uo — v9]. 


3. By Exercise 32 in Section 5.4, we have 


sHon= ft w) + sf (+ shew + shit ») 


1 1 1 1 1 
+ sf (#4 ah w + 5 a ai w+ shrew) 


+ (0+ hw tas (rt shew + shp(r+ shew + shru)))). 
so 
1 1 1 1 


5. a. The local truncation error is t4; = ihby® (&;), for some &, where t;_2 < &} < fj41. 
b. The method is consistent but unstable and not convergent. 
7. The method is unstable. 


Exercise Set 5.11 (Page 354) 


1. Euler’s method gives the results in the following tables. 


at Wi; Vi bh tj Wi Vi 
0.200 0.027182818 0.449328964 0.200 0.373333333 0.046105213 
0.500 0.000027 183 0.030197383 0.500 —0.093333333 0.250015133 
0.700 0.000000272 0.00499 1594 0.700 0.146666667 0.490000277 
1.000 0.000000000 0.000335463 1.000 1.333333333 1.000000001 
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GQ fj Wj i df Ww; Ji 
0.500 16.47925 0.479470939 0.200 6.128259 1.000000001 
1.000 256.7930 0.841470987 0.500 —378.2574 1.000000000 
1.500 4096.142 0.997494987 0.700 —6052.063 1.000000000 
2.000 65523.12 0.909297427 1.000 387332.0 1.000000000 


3. The Runge-Kutta fourth order method gives the results in the following tables. 


a. tj Wi Ji b. i Wi Yi 
0.200 0.45881186 0.44932896 0.200 0.07925926 0.04610521 
0.500 0.03181595 0.03019738 0.500 0.25386145 0.25001513 
0.700 0.00537013 0.00499159 0.700 0.49265127 0.49000028 
1.000 0.00037239 0.00033546 1.000 1.00250560 1.00000000 

Cc. ti Wi Vi d. tj Wj Ji 
0.500 188.3082 0.47947094 0.200 —215.7459 1.00000000 
1.000 35296.68 0.84147099 0.500 —555750.0 1.00000000 
1.500 6632737 0.99749499 0.700 — 104435653 1.00000000 
2.000 1246413200 0.90929743 1.000 —269031268010 1.00000000 


5. The Adams Fourth-Order Predictor-Corrector Algorithm gives the results in the following tables. 


a. ti Wi Ji b. ti Wi Ji 
0.200 0.4588119 0.4493290 0.200 0.0792593 0.0461052 
0.500 —0.0112813 0.0301974 0.500 0.1554027 0.2500151 
0.700 0.0013734 0.00499 16 0.700 0.5507445 0.4900003 
1.000 0.0023604 0.0003355 1.000 0.7278557 1.0000000 
Cc. tj Wi Ji d. tj Wi Yi 
500 188.3082 0.4794709 0.200 —215.7459 1.000000001 
1.000 38932.03 0.8414710 0.500 —682637.0 1.000000000 
1.500 9073607 0.9974950 0.700 —159172736 1.000000000 
2.000 2115741299 0.9092974 1.000 —566751172258 1.000000000 


7. The Trapezoidal Algorithm gives the results in the following tables. 


a. tj Wi k Ji b. ti Wi k Ji 
0.200 0.39109643 2 0.44932896 0.200 0.04000000 2 0.04610521 
0.500 0.02134361 2 0.03019738 0.500 0.25000000 2 0.25001513 
0.700 0.00307084 2 0.00499159 0.700 0.49000000 2 0.49000028 
1.000 0.00016759 2 0.00033546 1.000 1.00000000 2 1.00000000 

Cc. tj Wi k Ji d. tj Wi k Ji 
0.500 0.66291133 2 0.47947094 0.200 —1.07568307 4 1.00000000 
1.000 0.87506346 2 0.84147099 0.500 —0.97868360 4 1.00000000 
1.500 1.00366141 2 0.99749499 0.700 —0.99046408 3 1.00000000 
2.000 0.91053267 2 0.90929743 1.000 —1.00284456 3 1.00000000 
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9. a. tj Wij U4; W U2; 
0.100 —96.33011 0.66987648 193.6651 —0.33491554 
0.200 —28226.32 0.67915383 56453.66 —0.33957692 
0.300 —8214056 0.69387881 16428113 —0.34693941 
0.400 —2390290586 0.71354670 4780581173 —0.35677335 
0.500 —695574560790 0.73768711 1391149121600 —0.36884355 

bt Wii Uri W2i U2; 
0.100 0.61095960 0.66987648 —0.21708179 —0.33491554 
0.200 0.66873489 0.67915383 —0.31873903 —0.33957692 
0.300 0.69203679 0.69387881 —0.34325535 —0.34693941 
0.400 0.71322103 0.71354670 —0.35612202 —0.35677335 
0.500 0.73762953 0.73768711 —0.36872840 —0.36884355 


11. Using (4.25) on page 199 gives t+; = —ty"&)r, for some tf; < & < 41, and by Definition 5.18, the Trapezoidal 


method is consistent. Once again using (4.25) gives 


h 
yti+1) = WG) + 5 [f(ti.vGi)) + f Gey] - 


y"&i) 
12 


Rh. 


Subtracting the difference equation and using the Lipschitz constant L for f gives 


AL hL Ws 
Ive) — wisal S LvG) — wil + > Ly) — wil + > Ga) wiril + 55 LY (&)|. 


Let M = max,<,<, |y”’(x)|. Then, assuming AL # 2, 


Using Lemma 5.8 on page 270 gives 


Iy(fis1) ia- M iy) | + id M 
1) ~ Wiel S i Wi ; 
ee 2—AL 4 6(2 — hL) 
Mh? Mi 
ti — UW; < p2(b—a)L/(2—hL) 
te Ea 12L a 12L 


Thus, if hL 4 2, the Trapezoidal method is convergent, and consequently stable. 


13. b. The following tables list the results of the Backward Euler method applied to the problems in Exercise 1. 


ai ti Wi 


Yi 


2 0.20 0.75298666 
5 0.50 0.10978082 
7 0.70 0.03041020 
0 1.00 0.00443362 


NNNN] & 


0.44932896 
0.03019738 
0.00499159 
0.00033546 


b. i tj Wi 


Ji 


2 0.20 0.08148148 
5 0.50 0.25635117 
7 0.70 0.49515013 
0 1.00 1.00500556 


NNNN!] & 


0.04610521 
0.25001513 
0.49000028 
1.00000000 
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a i tj Wi k Ji 
2 0.50 0.50495522 2 0.47947094 
4 1.00 0.83751817 2 0.84147099 
6 1.50 0.99145076 2 0.99749499 
8 2.00 0.90337560 2 0.90929743 
d. 1 ti Wi k Ji 
2 0.20 1.00348713 3 1.00000000 
5 0.50 1.00000262 2 1.00000000 
7 0.70 1.00000002 1 1.00000000 
10 1.00 1.00000000 1 1.00000000 


15. a. The Trapezoidal method applied to the test equation gives 


14+4 2+har 
iy hr) = : 
w;, so Q(hd) 5 Bh 


Wir = i= hn 
2 


Thus, |Q(hA)| < 1, whenever Re(ha) < 0. 
b. The Backward Euler method applied to the test equation gives 


ip =, sa OOD 
. = , 1. = . 
aaa ts 1—ha 
Thus, |Q(hA)| < 1, whenever Re(hA) < 0. 
Exercise Set 6.1 (Page 368) 
1. a. Intersecting lines with solution x; = x2 = 1. 
b. One line, so there is an infinite number of solutions with x. = 3 _ ix. 
c. One line, so there is an infinite number of solutions with x. = — tx. 
d. Intersecting lines with solution x; = 2 and x2 = i. 
3. a. x; = 1.0, x» = —0.98, x3 = 2.9 b. x, = 1.1, x» = —-1.1, x3 = 2.9 


5. Gaussian elimination gives the following solutions. 
a. x; = 1.1875, x2. = 1.8125, x3 = 0.875 with one row interchange required 
b. x; = —1,2%2 = 0,x3 = 1 with no interchange required 
ce. x) = 1.5, x = 2,x3 = —1.2,x4 = 3 with no interchange required 
d. No unique solution 
7. Gaussian elimination with single precision arithmetic gives the following solutions: 
a. x; = —227.0769, x. = 476.9231, x3 = —177.6923; 


b. x; = 1.001291, x = 1, x3 = 1.00155; 
ec. x; = —0.03174600, x. = 0.5952377, x3 = —2.380951, x4 = 2.777777; 
d. x; = 1.918129, x. = 1.964912, x3 = —0.9883041, x4 = —3.192982, x5 = —1.134503. 
9. a. When a = -1/3, there is no solution. 
b. When a = 1/3, there is an infinite number of solutions with x, = x. + 1.5, and x» is arbitrary. 
ce. If a € +1/3, then the unique solution is 
3 —3 
“2. + 3a) 20+ 3a)’ 


13. The Gauss-Jordan method gives the following results. 
ax, = 0.98, x5 = —0.98, x3 = 2.9 b. x= 1.1, x2 = —1.0,x3 = 2.9 
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15. b. The results for this exercise are listed in the following table. (The abbreviations M/D and A/S are used for 


17. 


19. 


multiplications/divisions and additions/subtractions, respectively.) 


Gaussian Elimination Gauss-Jordan 
n M/D A/S M/D A/S 
3 17 11 21 12 
10 430 375 595 495 
50 44150 42875 64975 62475 


100 343300 338250 509950 499950 


The Gaussian-Elimination—Gauss-Jordan hybrid method gives the following results. 
a x) = 1.0, x2 = —0.98, x3 =2.9 b. y= 1.0, x2 = —1.0, x3 =2:9 
a. There is sufficient food to satisfy the average daily consumption. 


b. We could add 200 of species 1, or 150 of species 2, or 100 of species 3, or 100 of species 4. 


c. Assuming none of the increases indicated in part (b) was selected, species 2 could be increased by 650, or species 3 


could be increased by 150, or species 4 could be increased by 150. 


d. Assuming none of the increases indicated in parts (b) or (c) were selected, species 3 could be increased by 150, or 


species 4 could be increased by 150. 


Exercise Set 6.2 (Page 379) 


1. 


11. 


13. 


15. 


17. 


19. 
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. Gaussian elimination with three-digit chopping arithmetic gives the following results. 


a. x; = 30.0, x. = 0.990 b. x; = 0.00, x. = 10.0, x3 = 0.142 

c. x; = 0.206, x. = 0.0154, x, = —0.0156, x, = —0.716 d. x; = 0.828, x. = —3.32,x; = 0.153, x4 = 4.91 
Gaussian elimination with three-digit rounding arithmetic gives the following results. 

a. x; = —10.0, x. = 1.01 b. x; = 0.00, x2 = 10.0, x3 = 0.143 

ec. x; = 0.185, x2 = 0.0103, x3 = —0.0200, x4 = —1.12 d. x; = 0.799, x2 = —3.12, x3 = 0.151, x4 = 4.56 
Gaussian elimination with partial pivoting and three-digit chopping arithmetic gives the following results. 

a. x; = 10.0, x2 = 1.00 b. x; = —0.163, x. = 9.98, x3 = 0.142 

ec. x; = 0.177, x2 = —0.0072, x3 = —0.0208, x4 = —1.18 d. x; = 0.777, x2 = —3.10, x3 = 0.161, x4 = 4.50 
Gaussian elimination with partial pivoting and three-digit rounding arithmetic gives the following results. 

a. x; = 10.0, x2 = 1.00 b. x; = 0.00, x2 = 10.0, x3 = 0.143 

c. x; = 0.178, x2 = 0.0127, x3 = —0.0204, x4 = —1.16 d. x; = 0.845, x. = —3.37, x3 = 0.182, x4 = 5.07 
Gaussian elimination with scaled partial pivoting and three-digit chopping arithmetic gives the following results. 

a. x; = 10.0, x. = 1.00 b. x; = —0.163, x. = 9.98, x3 = 0.142 

ce. x; = 0.171, x. = 0.0102, x3 = —0.0217, x4 = —1.27 d. x; = 0.687, x. = —2.66, x3 = 0.117, x4 = 3.59 
Gaussian elimination with scaled partial pivoting and three-digit rounding arithmetic gives the following results. 

a. x; = 10.0, x. = 1.00 b. x; = 0.00, x. = 10.0, x3 = 0.143 

c. x; = 0.180, x. = 0.0128, x3 = —0.0200, x4 = —1.13 d. x; = 0.783, x. = —3.12, x3 = 0.147, x4 = 4.53 


817 


a. none b. Interchange rows 2 and 3. 

c. none d. Interchange rows 1 and 2. 

a. Interchange rows | and 2. b. Interchange rows | and 3. 

c. Interchange rows 1 and 2, then interchange rows 2 and 3. d. Interchange rows 1 and 2. 

a. Interchange rows 1 and 3, then interchange rows 2 and 3. b. Interchange rows 2 and 3. 

c. Interchange rows 2 and 3. d. Interchange rows 1 and 3, then interchange rows 2 and 3. 
a. Interchange rows | and 2, and columns | and 3, then interchange rows 2 and 3, and columns 2 and 3. 

b. Interchange rows | and 2, and columns | and 3, then interchange rows 2 and 3. 

c. Interchange rows 1 and 2, and columns | and 3, then interchange rows 2 and 3. 

d. Interchange rows 1 and 2, and columns | and 2, then interchange rows 2 and 3; and columns 2 and 3. 
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21 


23. 


25. 


27. 


29. 


31. 
33. 


. Using Algorithm 6.1 in Maple with Digits:=10 gives 

a. x; = 10.00000000, x. = 1.000000000 

b. x, = 0.000000033, x. = 10.00000001, x3 = 0.1428571429 

c. x; = 0.1768252958, x. = 0.0126926913, x; = —0.0206540503, x, = —1.182608714 
d. x; = 0.7883937842, x. = —3.125413672, x; = 0.1675965951, x4 = 4.557002521 
Using Algorithm 6.2 in Maple with Digits:=10 gives 


a. x; = 10.00000000, x. = 1.000000000 b. x; = 0.000000000, x. = 10.00000000, x3 = 0.142857142 


c. x; = 0.1768252975, x. = 0.0126926909, x; = —0.0206540502, x, = —1.182608696 
d. x; = 0.7883937863, x. = —3.125413680, x3 = 0.1675965980, x4 = 4.557002510 
Using Algorithm 6.3 in Maple with Digits:=10 gives 

a. x; = 10.00000000, x. = 1.000000000 


b. x; = 0.000000000, x» = 10.00000000, x3 = 0.1428571429 

c. x; = 0.1768252977, x» = 0.0126926909, x3 = —0.0206540501, x, = —1.182608693 

d. x; = 0.7883937842, x. = —3.125413672, x; = 0.1675965952, x1 = 4.55700252 

a. x, = 9.98,x) = 1.00 b. x; = 0.0724, x» = 10.0,x3 = 0.0952 

c. x; = 0.161, x = 0.0125, x3 = —0.0232, x4 = —1.42 d. x; = 0.719, x. = —2.86,x3 = 0.146, x, = 4.00 
a. x; = 10.0,x. = 1.00 b. x; = 0.00, = 10.0,x3 = 0.143 

c. x1 = 0.179, x) = 0.0127, x3 = —0.0203, xy = —1.15 d. x; = 0.874, x. = —3.49, x3 = 0.192, x4 = 5.33 


Only for (a), where a = 6. 

Using the Complete Pivoting Algorithm in Maple with Digits:=10 gives 

a. x; = 10.00000000, x. = 1.000000000 

b. x; = 0.000000000, x2 = 10.00000000, x3 = 0.1428571429 

c. x; = 0.1768252974, x. = 0.01269269087, x3 = —0.02065405015, x4 = —1.182608697 
d. x; = 0.17883937840, x. = —3.125413669, x3 = 0.1675965971, x4 = 4.557002516 


Exercise Set 6.3 (Page 390) 


5 


im” 
-—— 
—] 
Ld 
a | 
YN WwW 
—— | 
3 
a | 
o 
x 


4 
a. 
ai 0 | 
=h 3 —3 =2 
—4 10 11 4 -8 
a b. c 3 4 -ll d. —14 
1 15 6 13 -12 
-6 -7 -4 6 
1 
oe ae 4 
4 4 4 = 
. a, The matrix is singular. b. 3 -; -; c. The matrix is singular. d. . 
1 _5 3 | 28 
8 8 8 eel 
2 


7. The solutions to the linear systems obtained in parts (a) and (b) are, from left to right, 


3,-—6,—2,-1 and 1,1,1,1. 


9. a. Suppose A and A are both inverses of A. Then AA = AA = J and AA = AA = J. Thus, 


A=AI=A(AA) = (AAJA = IA = A. 


b. (AB)(B-'A7!) = A(BB~!)A~! = AIA~! = AA! = J and (B7'A7!)(AB) = B7!(A7!A)B = B-'UB = B“'B =I, 80 
(AB)! = B~'A7! since there is only one inverse. 


c. Since A~!A = AA~! = 1, it follows that A~! is nonsingular. Since the inverse is unique, we have (A~!)~! = A. 


-16 | 
1 
7 
1 
0 00 
1 
1 
u 10 
1-11 
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11. a. If C = AB, where A and B are lower triangular, then a, = 0 if k > i and by = 0 if k < j. Thus, 


n i 
cij = > Aikbyj = ) aixdyj, 
k=l eS 


which will have the sum zero unless j < i. Hence C is lower triangular. 
b. We have ax = 0 if k <i and by = 0 if k > j. The steps are similar to those in part (a). 


c. Let L be a nonsingular lower triangular matrix. To obtain the ith column of L~', solve n linear systems of the form 


os | : 
| Lan Xn 0 
where the 1 appears in the ith position to obtain the ith column of L~!. 


13. The answers are the same as those in Exercise 5. 


0 2 0 1 0 0 
15.a.A7=]0 0 3], A?=]0 1 O}], At*=A, AD=A?, AS=I,... 
, o 0 00 1 
b. Year 1 Year 2 Year 3 Year 4 
Age 1 6000 36000 12000 6000 
Age 2 6000 3000 18000 6000 
Age 3 6000 2000 1000 6000 
0 2 0 
c.A7'=]0 0 3]. The i,j-entry is the number of beetles of age i necessary to produce one beetle of age j. 
+ 0 0 
17. a. We have 
7 4 4 0 2(xp — X%1) tao +a; 2(xo — X1) + 3a + 3Q 
= —3 -—6 O 3 (x1 —X9) — a — 2a a 3(x| — Xo) — 3a; — 6a 
0 0 3 0 ao ~ 3a 
0 0 0 1 Xo Xo 


Oo CO wNrvr 


Exercise Set 6.4 (Page 399) 


1. The determinants of the matrices are: 
a. —8 b. 14 c. 0 d. 3 
3. The answers are the same as in Exercise 1. 
5. a=-3 and a = 2 
7a=-—5 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


820 Answers for Selected Exercises 


9. When n = 2, detA = aj, dy) — ay2@>, requires 2 Multiplications and 1 Subtraction. Since 


1 
a= and 2!—-1=1, 


the formula holds for n = 2. Assume the formula is true for n = 2,...,m, and let A be an (m+ 1) x (m+ 1) matrix. Then 


m+1 


detA = > aj jAij, 
j=l 


for any 7, where | < i < m+ 1. To compute each A;; requires 


m—1 


me Multiplications and m!—1  Additions/Subtractions. 


Thus, the number of Multiplications for det A is 


m—-1 m—-1 m 

1 1 1 

! Y ae = ! Y enoee ewer ! 

imp |mt | tore p= ore oy] tase A 
—| = 

and the number of Additions/Subtractions is 


(m + 1) [m! — 1] +m= (m+ 1)! - 1. 


By the principle of mathematical induction, the formula is valid for any n > 2. 
11. The result follows from det AB = det A - det B and Theorem 6.17. 
13. a. If D; is the determinant of the matrix formed by replacing the ith column of A with b and if D = det A, then 


=D;/D, for i=1,...,n. 


b. (n+ 1)! eas eae: 7) +n Multiplications/Divisions 
(n + 1)! —n— 1 Additions/Subtractions. 


Exercise Set 6.5 (Page 409) 


lia. x = -3,4% =3,%=1 bx = $3, =—-3,43= 2 
1 0 0 0 0 0 1 (0 
1 0 1 0 
3. a 0 ; b P=/]1 0 O eee oe SS) eee lo Se 
0 0 001 0 1 0 0 0 0 0 1 
0 0 0 1 1 0 0 0 
0 0 a ae 
5. a 1 1 O} andU=]0 45 7.5 
15 1 1 0 O -4 
1 0 0 1.012 —2.132 3.104 
b. L —2.106719 1 0} and U = 0 —0.3955257 —0.4737443 
3.067193 1.197756 1 0 0 —8.939141 
A I 0 00 0 0 0 q 
i. 15 0 O 
c. and U = 0 05 0 
=D ee : 1 b 0 0 1| 
0 2.175600 4.023099  —2.173199 5.196700 
d —1. 084 : ; 0 and U = 0 13.43947 —4.018660 10.80698 
: —0.4596433 —0.2501219 1 0 — 0 0 —0.8929510 5.091692 
2.768661 —0.3079435 —5.352283 1 0 0 0 12.03614 
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7a x Hl,» =2,x%,=-1 bx) =1,%=1,%3=1 
c. xX; = 1.5, x) = 2, x3 = —1.199998, x4 = 3 
d. x; = 2.939851, x. = 0.07067770, x3 = 5.677735, x4 = 4.379812 
O 1 Oj} ] 1 Oo o;f1 1 -t 1 0 O/]]1 0 OFF 1 2 -1 
9a P'LU=|1 0 0}1)0 1 O;}|0 2 3 b PLU={]0 0 1]]/2 1 O}]O —-5 6 
0 0 1}/[0 -} 1}/[0 0 3 0 1 Of[1 0 1}/{0 oO 4 
1 0 0 O}}1 0 0 O}}1 -2 3 0 
0 0 0 I]/}/2 1 0 0} 10 5 -2 1 
it = 
CPE To 1 oO OF. OHO 0: ai 23 
0 0 1 O}; [3 0 0 1)10 0 0 3 
E 0 0 q E 0 0 q F 2 3 
0 0 0 I]/}/2 1 0 0} 10 5 -3 -l 
it — 
ec 0 0 1 Oj;;1 0 1 O;;0 0 -l - 
lo 1 0 o| li 0 0 1 [Lo 0 0 1 
11. ¢. Multiplications/Divisions Additions/Subtractions 
Factoring into LU in? — in in} — in? + in 
Solving Ly = b i - in in? —in 
Solving Ux = y sn + in sn - in 
Total in +n — tn ind + Sn? —2n 
d. Multiplications/Divisions Additions/Subtractions 
Factoring into LU in - in int a in + in 
Solving Ly = 5 (4n? — 5n)m (4n? — in)m 
Solving Ux = y® (4n? + tn) in? — inym 
Total in + mn? — in in + (m — $)n? —(m— 1)n 
Exercise Set 6.6 (Page 425) 
1. i. The only symmetric matrix is (a). 
ii. All are nonsingular. 
iii. Matrices (a) and (b) are strictly diagonally dominant. 
iv. The only positive definite matrix is (a). 
1 0 O 20 0 
3.aL=}_1 1 go}, D=|0 3 0 
4 
0 -2 1 0 0 3 
1.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 
b. La | 0-25 1.0 0.0 0.0 D= 0.0 2.75 0.0 0.0 
, 0.25 —0.45454545 1.0 0.0 |" 0.0 0.0 1.1818182 0.0 
0.25 0.27272727 —_:0.076923077 1.0 0.0 0.0 0.0 15384615 
1.0 0.0 0.0 ied 4.0 0.0 0.0 0.0 
e. L= | 9.25 1.0 0.0 00} pjp—}%0 275 0.0 0.0 
-0.25 -—0.27272727 10 0.0 9.0 0.0 4.5454545 0.0 
0.0 0.0 0.44 1.0 0.0 0.0 0.0 3.12 
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1.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 
d.L= 0.33333333 = 1.0 0.0 0.0 D= 0.0 3.3333333 0.0 0.0 
“| 0.16666667 0.2 1.0 0.0 |’ ~ | 0.0 0.0 3.7 0.0 
—0.16666667 0.1 —0.24324324 1.0 0.0 0.0 0.0 2.5810811 
5. Cholesky’s Algorithm gives the following results. 
2 0 0 0 
1.414213 0 0 
a. L = | —0.7071069 1.224743 0 b L= an one 2 : 
0 _0.8164972 1.154699 0.5 —0.7537785 — 1.087113 0 
: . 0.5 0.4522671 0.08362442 1.240346 
2 0 0 0 2.449489 0 0 0 
ae 0.5 1.658311 0 0 d. L= 0.8164966 1.825741 0 0 
“| —0.5 —0.4522671 2.132006 0 “1 0,4082483 0.3651483 1.923538 0 
0 0 0.9380833 1.766351 —0.4082483 0.1825741 —0.4678876 1.606574 
7. The modified factorization algorithm gives the following results. 
ax, = 1, n= -l, x34 0 b. y= 0.2, D0 —0.2, 34> —0.2, x= 0.25 
«x = 1m =2,%3=-l,xy=2 d. x; = —0.8586387, x. = 2.418848, x3 = —0.9581152, x4 = —1.272251 
9. The modified Cholesky’s algorithm gives the following results. 
a x) = 1, x2 = —1,%3 =0 b. xyy= 0.2, x2 = —0.2, x3 = —0.2, x4 = 0.25 
Cc. xy = 1x. = 2,x3 = —1,xy, = 2 d. x; = —0.85863874, x. = 2.4188482, x3 = —0.95811518, x4 = —1.2722513 
11. The Crout Factorization Algorithm gives the following results. 
a. x; = 0.5, x» = 0.5, x3 = 1 b. x; = —0.9999995, x. = 1.999999, x; = 1 
«xy =lwm=-lxw=0 d. x; = —0.09357798, x2 = 1.587156, x3 = —1.167431, x4 = 0.5412844 
13. We have x; = 1, for eachi=1,..., 10. 
15. Only the matrix in (d) is positive definite. 
17. -2<a< 3 
19.0<B <1land3<a<5-8 


21. 


~ 


. No, for example, consider R | ; 
. Yes, since A = A’. 
. Yes, since x'(A + B)x = x'Ax + x'Bx. 


. Yes, since x'A*x = x'A'Ax = (Ax)'(Ax) > 0, and because A is nonsingular, equality holds only if x = 0. 


a0 me 


e. No, for example, consider A = E | and B= Re Y |: 


1 0 10 
23. a. Since detA = 3a — 2, A is singular if and only ifaw = 28/3.  b. ja| > 1,/B| < 1 
ce p=1 d.a>,p=1 
25. One example is A = i il ; 


27. 


0.1 1.0 

The Crout Factorization Algorithm can be rewritten as follows: 
Step 1 Set he =ayyu, =c/h. 

Step 2 For i= 2,...,n—1 set j; =a; — Djuj;_y3 uj; = c/l;. 
Step 3 Set |, = dy — DyUn—1- 

Step 4 Set z =d,/l). 

Step 5 For i= 2,...,n set z; = (dj — b;z-1)/l;. 

Step 6 Set X= Zp: 


Step 7 = Fori=n—1,...,1 set x; = z — ujXj41. 
Step 8 OUTPUT (m,...,%y): 
STOP. 
29. i; = 0.6785047, ip = 0.4214953, i; = 0.2570093, ig = 0.1542056, is = 0.1028037 


31. 


a. Mating male i with female j produces offspring with the same wing characteristics as mating male j with female i. 
b. No. Consider, for example, x = (1,0, —1)’. 
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Exercise Set 7.1 (Page 441) 


1. a. We have ||x||o.o = 4 and ||x||2 = 5.220153. b. We have ||x||.. = 4 and ||x||2 = 5.477226. 
c. We have ||x||o. = 2* and ||x||2 = (1 + 4*)!/”. 
d. We have ||xI|oco = 4/(k + 1) and ||x||> = (16/(k + D2 +. 4/4 + Ke 4)? 
3. a. We have limy_,x. x = (0,0,0). b. We have limy_... x = (0, 1,3)’. 
c. We have limy_.. x = (0,0, 5). d. We have limy_,ox = (1,—1, 1)’. 
5. a. We have ||x — X||oo = 8.57 x 1074 and ||AX — bl|oo = 2.06 x 107+. 
b. We have ||x — &||,. = 0.90 and ||A% — b] |. = 0.27. 
c. We have ||x — X||,. = 0.5 and ||AX — b||., = 0.3. 
d. We have ||x — &||,. = 6.55 x 10-7, and ||AX — b]|,, = 0.32. 
7. Let A= F and B = F if Then ||AB||o = 2, but ||Allo - ||Bllo = 1. 


9. b. We have 
4a. ||Al|l- = V326 
4b. |All = V326 


4c. ||Allr =4 
4d. ||All- = V 148. 
15. First note that the right-hand side of the inequality is unchanged if x is replaced by any vector X with |x;| = |X;| for each 


i= 1,2,...n. Then choose the new vector xX so that X;y; > 0 for each i, and apply the inequality to x and y. 


Exercise Set 72 (Page 449) 


1. a. The eigenvalue 4; = 3 has the eigenvector x, = (1,—1)', and the eigenvalue A, = 1 has the eigenvector x, = (1, 1)’. 


( a) 
x=j]{1, 5) ; 


b. The eigenvalue A; = oe has the eigenvector 


and the eigenvalue A, = 1v5 has the eigenvector 


c. The eigenvalue A, = $ has the eigenvector x; = (1, 1)’, and the eigenvalue A, = —4 has the eigenvector x. = (1,-—1)’. 

d. The eigenvalue 4, = 4, = 3 has the eigenvectors x, = (0,0, 1)’ and x. = (1, 1,0)’, and the eigenvalue 43 = 1 has the 
eigenvector x3 = (—1, 1,0)’. 

e. The eigenvalue A; = 7 has the eigenvector x; = (1,4,4)’, the eigenvalue 4. = 3 has the eigenvector x) = (1, 2,0)’, and 
the eigenvalue 43 = —1 has the eigenvector x3 = (1,0, 0)’. 

f. The eigenvalue 4; = 5 has the eigenvector x; = (1,2, 1)’, and the eigenvalue 2, = A3 = 1 has the eigenvectors 
x) = (-1,0,1)' and x3 = (—1, 1,0)’. 

3. a. The eigenvalues 4; = 2+ J/2i and Ag =2—- /2i have eigenvectors xX; = (-V2i, 1)‘ and x» = (/2 i, 1)'. 

b. The eigenvalues 4, = (3 + V7 i)/2 and Ay = (3 — V7i)/2 have eigenvectors x; = ((1 — V7i)/2, 1)’ and 

% = (1+ 7771/2, D'. 


5. a. 3 b. 1 c. 4 d. 3 e. 7 ee 

7. Only the matrix in 1(c) is convergent. 

9. a. 3 b. 1.618034 c. 0.5 d. 3 e. 8.224257 f. 5.203527 
11. Since 


RIF ee 
oo 
Le 


1 0 
Ak = | 2-1 9-« |, we have lim A‘ = 
k>00 
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Also, 
Qk 0 
: 0 O 
k . k 
ae ae ee jim at =[f al 


13. Let A be an n x n matrix. Expanding across the first row gives the characteristic polynomial 


pia) = det(A — AD) = (an — YM + YO(- 1) ayjMy. 


j=2 
The determinants M,; are of the form 
[ @ axy—-h +s a2 j-1 Q2j+1 li Q2n 
a3 432 ee a3 j-1 43 j+1 eee 43n 
Gir G12. +++) G-1j-1 — A j-1j+1 sth G-tn 
My =det} a a2 pas Gj j-1 Gi j+t genes Gin , 
Gj+11 Gj+1,2 ee Gj+1j-1 Gjpijpi A Gj+1n | 
Qn Qn2 wre Qn j-1 Qnj+l rans Qnn — ‘ 


for j = 2,...,n. Note that each Mj; has n — 2 entries of the form aj; — 4. Thus, 


P(A) = det(A — AT) = (ai, — A)Mi, + {terms of degree n — 2 or less}. 


Since 
ay, — 2 a3 pee lb an ] 
32 a33 — 4 : | 
Mi, = det | 
An-1.n 
an2 ee oes, Qnn-1 ann — Xr 


is of the same form as det(A — AJ), the same argument can be repeatedly applied to determine 
P(A) = (ay, — A) (a2 — 4) ++ + (Gan — 4) + {terms of degree n — 2 or less in A}. 


Thus, p(A) is a polynomial of degree n. 
15. a. det(A — Al) = det((A — AJ’) = det(A’ — AJ) 
b. If Ax = Ax, then A?x = 1Ax = 7x, and by induction, A‘x = ‘x. 
c. If Ax = Ax and A~! exists, then x = AA~!x. By Exercise 8 (b), 4 4 0, so ix =Aq!x, 


d. Since A~'x = /x, we have (A~')’x = 1A~'x = jx. Mathematical induction gives 
1 
~Iyky 
(A*)*x = ie 


e. If Ax = Xx, then 
q(A)x = qox + Ax +... + qeAkx = gox t+ qiaxt...+qpa*x = q(A)x. 
f. Let A — a/ be nonsingular. Since Ax = Ax, 
(A — al)x = Ax — alx = 4x — ax = (A — @)x. 
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Thus 
= (A-al)"'x. 
i. ( al) x 
17. a. We have the real eigenvalue A = 1 with the eigenvector x = (6,3, 1)’. 
b. Choose any multiple of the vector (6, 3, 1)’. 
19. Let Ax = Ax. Then |A| |x|] = ||Ax|| < ||All ||x!], which implies |A| < ||A||. Also, (1/A)x = A~!x so 1/|A| < ||A7!]| and 
|ATIT! < [Al 
Exercise Set 73 (Page 459) 
1. Two iterations of Jacobi’s method gives the following results. 
a. (0.1428571, —0.3571429, 0.4285714)' b. (0.97, 0.91, 0.74)! 
c. (—0.65, 1.65, —0.4, —2.475)' d. (1.325, —1.6, 1.6, 1.675, 2.425)! 
3. Two iterations of the Gauss-Seidel method give the following results. 
a. (0.1111111, —0.2222222, 0.6190476)' b. (0.979, 0.9495, 0.7899)’ 
c. (—0.5, 2.64, —0.336875, —2.267375)' d. (1.189063, —1.521354, 1.862396, 1.882526, 2.255645)’ 
5. Jacobi’s Algorithm gives the following results. 
a. x) — (0,03507839, —0.2369262, 0.6578015)' b. x = (0.9957250, 0.9577750, 0.7914500)' 


Cc. 


x) = (—0.7975853, 2.794795, —0.2588888, —2.251879)! 


d. x“ = (—0.7529267, 0.04078538, —0.2806091, 0.6911662)' 
7. The Gauss-Seidel Algorithm gives the following results. 
a. x = (0.03535107, —0.2367886, 0.6577590)! b. x® = (0.9957475, 0.9578738, 0.7915748)! 
ce. x" = (—0.7973091, 2.794982, —0.2589884, —2.251798)' 
d. x = (0.7866825, —1.002719, 1.866283, 1.912562, 1.989790)! 
9. a 
0 4-3 _ 
T=|-1 0 -l and i Ma ai oar 
2 37 0 
Thus, the eigenvalues of 7; are 0 and +%3i, so p(T;) = % = 1s 
b. x? = (—20.827873, 2.0000000, —22.827873)! 
0 2 5 1\? 
T,=|0 -+ —-3 and det(Al — T,) =A (2+ ;) . 
0 0 -} 
Thus, the eigenvalues of T, are 0, —1/2, and —1/2; and p(T,) = 1/2. 
d. x?) = (1.0000023, 1.9999975, —1.0000001)' is within 10~> in the /,. norm. 
11. a. A is not strictly diagonally dominant. 
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0 0 1 
T=|905 0 0.25 and = e(7;) = 0.97210521. 
—-1 0.5 0 


Since 7; is convergent, the Jacobi method will converge. 


. With x = (0,0, 0), x8? = (0.90222655, —0.79595242, 0.69281316)’. 
. p(T;) = 1.39331779371. Since T; is not convergent, the Jacobi method will not converge. 
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13. a. Subtract x = Tx+c from x“ = Tx*-) + € to obtain x® — x = T(x*-) — x). Thus, 
k k-1 
IIx — xl] < TI Ix“-? —xI. 
Inductively, we have 
Ix — xl] < TIM Ix© — xl]. 


The remainder of the proof is similar to the proof of Corollary 2.5. 
b. The last column has no entry when ||T||oo = 1. 


Ix? — xlleo [Tlloo ITZ — xo TE Ix — x], 
1 (a) 0.22932 0.857143 0.48335 2.9388 
1 (b) 0.051579 0.3 0.089621 0.11571 
1 (c) 1.1453 0.9 2.2642 20.25 
1 (d) 0.27511 1 0.75342 
1 (e) 0.59743 1 1.9897 
1(f) 0.875 0.75 1.125 3.375 


15. The results for this exercise are listed on page 827 in Exercise 11, where additional results are given for a method presented 
in Section 7.4. 


Exercise Set 7.4 (Page 467) 


1. Two iterations of the SOR method give the following results. 
a. (0.05410079, —0.2115435, 0.6477159)' b. (0.9876790, 0.9784935, 0.7899328)' 
c. (—0.71885, 2.818822, —0.2809726, —2.235422)' d. (1.079675, —1.260654, 2.042489, 1.995373, 2.049536)’ 
3. Two iterations of the SOR method with w = 1.3 give the following results. 
a. (—0.1040103, —0.1331814, 0.6774997)' 
b. (0.957073, 0.9903875, 0.7206569)' 
ec. (—1.23695, 3.228752, —0.1523888, —2.041266)' 
d. (0.7064258, —0.4103876, 2.417063, 2.251955, 1.061507)’ 
5. The SOR Algorithm gives the following results. 
a. x?) — (0.03488469, —0.2366474, 0.6579013)' 
b. x = (0.9958341, 0.9579041, 0.7915756)! 
c. x® = (—0.7976009, 2.795288, —0.2588293, —2.251768)' 
d. x™ = (—0.7534489, 0.04106617, —0.2808146, 0.6918049)! 
e. x“ — (0.7866310, — 1.002807, 1.866530, 1.912645, 1.989792)' 
f. x = (0.9999442, 1.999934, 1.000033, 1.999958, 0.99998 15, 2.000007)! 
7. The tridiagonal matrices are in parts (b) and (c). 
(1b): For m = 1.012823 we have x“ = (0.9957846, 0.9578935, 0.7915788)'. 
(1c): For @ = 1.153499 we have x = (—0.7977651, 2.795343, —0.2588021, —2.251760)'. 
9. Let A1,...,4, be the eigenvalues of T,,. Then 


n 


[ [4 =det 7, = det (w —oL)'[(—o)D+ out) 


i=l 


= det(D — wL)~! det((1 — )D + wU) = det (D™) det((1 — wD) 


1 
-( )(« @)"a11a22 tee ‘m)) = el ae w)". 
(411422... Ann) 


Thus 
P(To) = max |Ai| = lo — I], 


and |w — 1| < 1 if and only if 0 < w < 2. 
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11. Jacobi Gauss-Seidel SOR (@ = 1.2) 
33 8 13 
iterations iterations iterations 

xy 1.53873501 1.53873270 1.53873549 

X2 0.73142167 0.73141966 0.73 142226 

x3 0.10797136 0.10796931 0.10797063 

x4 0.17328530 0.17328340 0.17328480 

Xs 0.04055865 0.04055595 0.04055737 

X6 0.08525019 0.08524787 0.08524925 

x7 0.16645040 0.16644711 0.16644868 

xg 0.12198156 0.12197878 0.12198026 

Xo 0.10125265 0.10124911 0.10125043 
X10 0.09045966 0.09045662 0.09045793 
X14 0.07203 172 0.07202785 0.07202912 
X12 0.07026597 0.07026266 0.07026392 
X13 0.06875835 0.06875421 0.06875546 
x14 0.06324659 0.06324307 0.06324429 
X15 0.05971510 0.0597 1083 0.0597 1200 
X16 0.05571199 0.05570834 0.05570949 
X17 0.05187851 0.05187416 0.05187529 
X1g 0.04924911 0.04924537 0.04924648 
X19 0.04678213 0.04677776 0.04677885 
X20 0.04448679 0.04448303 0.04448409 
X21 0.04246924 0.04246493 0.04246597 
X22 0.04053818 0.04053444 0.04053546 
X03 0.03877273 0.03876852 0.03876952 
X04 0.03718190 0.03717822 0.03717920 
X25 0.03570858 0.03570451 0.03570548 
X26 0.03435 107 0.03434748 0.03434844 
Xo7 0.03309542 0.03309 152 0.03309246 
Xog 0.03192212 0.03191866 0.03191958 
X29 0.03083007 0.03082637 0.03082727 
X30 0.02980997 0.02980666 0.02980755 
X31 0.02885510 0.02885 160 0.02885248 
X32 0.02795937 0.02795621 0.02795707 
X33 0.02711787 0.02711458 0.02711543 
X34 0.02632478 0.02632179 0.02632262 
X35 0.02557705 0.02557397 0.02557479 
X36 0.02487017 0.02486733 0.024868 14 
X37 0.02420147 0.02419858 0.02419938 
X38 0.02356750 0.02356482 0.02356560 
X39 0.02296603 0.02296333 0.02296410 
X40 0.02239424 0.02239171 0.02239247 
X44 0.02185033 0.02184781 0.02184855 
X42 0.02133203 0.02132965 0.02133038 
X43 0.02083782 0.02083545 0.02083615 
X44 0.02036585 0.02036360 0.02036429 
X45 0.01991483 0.01991261 0.01991324 
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Jacobi Gauss-Seidel SOR (@ = 1.2) 
33 8 13 
iterations iterations iterations 
X46 0.01948325 0.01948113 0.01948175 
x47 0.01907002 0.01906793 0.01906846 
X4g 0.01867387 0.01867187 0.01867239 
X49 0.01829386 0.01829190 0.01829233 
x50 0.71792896 0.01792707 0.01792749 
X51 0.01757833 0.01757648 0.01757683 
X50 0.01724113 0.01723933 0.01723968 
X53 0.01691660 0.01691487 0.01691517 
X54 0.01660406 0.01660237 0.01660267 
X55 0.01630279 0.01630127 0.01630146 
X56 0.01601230 0.01601082 0.01601101 
X57 0.01573198 0.01573087 0.01573077 
X58 0.01546129 0.01546020 0.01546010 
X59 0.01519990 0.01519909 0.01519878 
X60 0.01494704 0.01494626 0.01494595 
X61 0.01470181 0.01470085 0.01470077 
X62 0.01446510 0.01446417 0.01446409 
X63 0.01423556 0.01423437 0.01423461 
X64 0.01401350 0.01401233 0.01401256 
X65 0.01380328 0.01380234 0.01380242 
X66 0.01359448 0.01359356 0.01359363 
X67 0.01338495 0.01338434 0.01338418 
X68 0.01318840 0.01318780 0.01318765 
X69 0.01297174 0.01297109 0.01297107 
x70 0.01278663 0.01278598 0.01278597 
x7 0.01270328 0.01270263 0.01270271 
x72 0.01252719 0.01252656 0.01252663 
x73 0.01237700 0.01237656 0.01237654 
x74 0.01221009 0.01220965 0.01220963 
x75 0.01129043 0.01129009 0.01129008 
X16 0.01114138 0.01114104 0.01114102 
x77 0.01217337 0.01217312 0.01217313 
X78 0.01201771 0.01201746 0.01201746 
X79 0.01542910 0.01542896 0.01542896 
Xg0 0.01523810 0.01523796 0.01523796 


Exercise Set 75 (Page 476) 


1. The || - ||. condition numbers are: 

a. 50 b. 241.37 c. 600,002 (d) 339,866 
3. IX — Xlloo K(A)||b — AX] 0/IlAlloo 

a. 8.571429 x 107+ 1.238095 x 10-7 

b. 0.1 3.832060 

c. 0.04 0.8 

d. 20 1.152440 x 10° 


5. Gaussian elimination and iterative refinement give the following results. 
a. (i) (—10.0, 1.01)’, Gi) (10.0, 1.00)’ 
b. (i) (12.0, 0.499, —1.98)’, (ii) (1.00, 0.500, — 1.00)’ 
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ce. (i) (0.185, 0.0103, —0.0200, — 1.12)‘, (ii) (0.177, 0.0127, —0.0207, — 1.18)! 
d. (i) (0.799, —3.12, 0.151, 4.56)’, (ii) (0.758, —3.00, 0.159, 4.30)’ 
7. The matrix is ill-conditioned since K. = 60002. We have x = (—1.0000, 2.0000)’. 


9. For any vector x, we have 


IxI| = |A“'Ax|] < JAT']] AKI, so Axil 2 eT 


Let x 4 0 be such that ||x|| = 1 and Bx = 0. Then 


(4 — Bx] = Any > 
A" 
and 
WA-Bx 
lal = Ja] dal KA 


Since ||x|| = 1, 


|A-Bll 


All| ~ K(A) 


|(A — B)x|] < ||A — BI] |x|] = |A— Bl] and 


11, a. Ko (H) = 28,375 
b. Koo (H) = 943, 656 
c. actual solution x = (—124, 1560, —3960, 2660)’; ; 
approximate solution ¥ = (—124.2, 1563.8, —3971.8, 2668.8)’; ||x — Xlloo = 11.8; =e = 0.02980; 


IIXlloo 
KA) I5b|lo0 | ISAlloo 28375 6.6 x 10-6 
ai) | [lo [Allo J ciao) |°* — 2.083 
1 = Koo(A) ( Wttes ) L Mblloo wo | 128375 (s8e*) 
= 0.09987. 


Exercise Set 7.6 (Page 492) 


829 


1. a. (0.18, 0.13)! 
b. (0.19, 0.10)’ 
c. Gaussian elimination gives the best answer since v? = (0,0)‘ in the conjugate gradient method. 
d. (0.13, 0.21)’. There is no improvement, although v? + 0. 
3. a. (1.00, —1.00, 1.00)’ 
b. (0.827, 0.0453, —0.0357)' 
c. Partial pivoting and scaled partial pivoting also give (1.00, —1.00, 1.00)’. 
d. (0.776, 0.238, —0.185)’; 
The residual from (3b) is (—0.0004, —0.0038, 0.0037)’, and the residual from part (3d) is (0.0022, —0.0038,0.0024)’. 
There does not appear to be much improvement, if any. Rounding error is more prevalent because of the increase in the 
number of matrix multiplications. 
5. a. x? = (0.1535933456, —0.1697932117, 0.5901172091)', ||r@ ||. = 0.221. 
b. x® = (0.9993129510, 0.9642734456, 0.7784266575)’, ||r |lo0 = 0.144. 
ce. x? = (—0.7290954114, 2.515782452, —0.6788904058, —2.331943982)', |r? ||. = 2.2. 
d. x® = (—0.7071108901, —0.0954748881, —0.3441074093, 0.5256091497)', |[r@ ||. = 0.39. 
e. x = (0.533596838 1, 0.9367588935, 1.339920949, 1.743083004, 1.743083004)’, |r? ||. = 1.3. 
f. x® = (1.022375671, 1.686451893, 1.022375671, 2.0609 19568, 0.8310997764, 2.060919568)', Ir? ||. = 1.13. 
7. a. x = (0.06185567013, —0.1958762887, 0.6185567010)', |r ||, = 0.4 x 107°. 
b. x® = (0.9957894738, 0.9578947369, 0.7915789474)', ||r® ||, = 0.1 x 107°. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


830 Answers for Selected Exercises 


ce. x = (—0.7976470579, 2.795294120, —0.2588235305, —2.251764706)', ||r ||, = 0.39 x 1077. 
d. x“ = (—0.7534246575, 0.04109589039, —0.2808219179, 0.6917808219)', |[r |]o = 0.11 x 107°. 
e. x© = (0.4516129032, 0.7096774197, 1.677419355, 1.741935483, 1.806451613)', |[r ||. = 0.2 x 107°. 
f. x = (1.000000000, 2.000000000, 1.000000000, 2.000000000, 0.9999999997, 2.000000000)', |r ||, = 0.44 x 107°. 
9. a. Jacobi Gauss-Seidel SOR (@ = 1.3) Conjugate Gradient 
49 28 13 9 
iterations iterations iterations iterations 
xX} 0.93406183 0.93406917 0.93407584 0.93407713 
X2 0.97473885 0.97475285 0.97476180 0.97476363 
X3 1.10688692 1.10690302 1.10691093 1.10691243 
x4 1.42346150 1.42347226 1.42347591 1.42347699 
Xs 0.85931331 0.85932730 0.85933633 0.85933790 
X6 0.80688119 0.80690725 0.80691961 0.80692197 
x7 0.85367746 0.85370564 0.85371536 0.85372011 
Xg 1.10688692 1.10690579 1.10691075 1.10691250 
Xo 0.87672774 0.87674384 0.87675177 0.87675250 
X10 0.80424512 0.80427330 0.80428301 0.80428524 
Xi 0.80688119 0.80691173 0.80691989 0.80692252 
X12 0.97473885 0.97475850 0.97476265 0.97476392 
X13 0.93003466 0.93004542 0.93004899 0.93004987 
X14 0.87672774 0.87674661 0.87675155 0.87675298 
X15 0.85931331 0.85933296 0.85933709 0.85933979 
X16 0.93406183 0.93407462 0.93407672 0.93407768 
b. Jacobi Gauss-Seidel SOR (@ = 1.2) Conjugate Gradient 
60 35 23 11 
iterations iterations iterations iterations 
x) 0.39668038 0.39668651 0.39668915 0.39669775 
XQ 0.07175540 0.07176830 0.07177348 0.07178516 
X3 —0.23080396 —0.23078609 —0.23077981 —0.23076923 
x4 0.24549277 0.24550989 0.24551535 0.24552253 
X5 0.834054 12 0.834065 16 0.83406823 0.83407148 
X6 0.51497606 0.51498897 0.51499414 0.51500583 
x7 0.12116003 0.12118683 0.12119625 0.12121212 
Xg —0.24044414 —0.24040991 —0.24039898 —0.24038462 
Xo 0.37873579 0.37876891 0.37877812 0.37878788 
X10 1.09073364 1.09075392 1.09075899 1.09076341 
XxX 0.54207872 0.54209658 0.54210286 0.54211344 
X12 0.13838259 0.13841682 0.13842774 0.13844211 
X13 —0.23083868 —0.23079452 —0.23078224 —0.23076923 
X14 0.41919067 0.41923 122 0.41924136 0.41925019 
X15 1.15015953 1.15018477 1.15019025 1.15019425 
X16 0.51497606 0.51499318 0.51499864 0.51500583 
x7 0.12116003 0.12119315 0.12120236 0.12121212 
X18 —0.24044414 —0.24040359 —0.24039345 —0.24038462 
X19 0.37873579 0.37877365 0.37878188 0.37878788 
X29 1.09073364 1.09075629 1.09076069 1.09076341 
X21 0.39668038 0.39669 142 0.39669449 0.39669775 
X22 0.07175540 0.07177567 0.07178074 0.07178516 
X93 —0.23080396 —0.23077872 —0.23077323 —0.23076923 
X24 0.24549277 0.2455 1542 0.24551982 0.24552253 
X25 0.834054 12 0.83406793 0.83407025 0.83407148 
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c. Jacobi Gauss-Seidel SOR (@ = 1.1) Conjugate Gradient 
15 9 8 8 
iterations iterations iterations iterations 

x} —3.07611424 —3.07611739 —3.07611796 —3.07611794 

Xo —1.65223176 —1.65223563 —1.65223579 —1.65223582 

X3 —0.53282391 —0.53282528 —0.53282531 —0.53282528 

x4 —0.04471548 —0.04471608 —0.04471609 —0.04471604 

x5 0.17509673 0.17509661 0.17509661 0.17509661 

X6 0.29568226 0.29568223 0.29568223 0.29568218 

x7 0.37309012 0.37309011 0.37309011 0.37309011 

Xg 0.42757934 0.42757934 0.42757934 0.42757927 

Xo 0.46817927 0.46817927 0.46817927 0.46817927 
X10 0.49964748 0.49964748 0.49964748 0.49964748 
Xi 0.52477026 0.52477026 0.52477026 0.52477027 
X12 0.54529835 0.54529835 0.54529835 0.54529836 
X13 0.56239007 0.56239007 0.56239007 0.56239009 
X14 0.57684345 0.57684345 0.57684345 0.57684347 
X15 0.58922662 0.58922662 0.58922662 0.58922664 
X16 0.59995522 0.59995522 0.59995522 0.59995523 
X17 0.60934045 0.60934045 0.60934045 0.60934045 
X18 0.61761997 0.61761997 0.61761997 0.61761998 
X19 0.62497846 0.62497846 0.62497846 0.62497847 
X20 0.63156161 0.63156161 0.63156161 0.63156161 
X21 0.63748588 0.63748588 0.63748588 0.63748588 
X20 0.64284553 0.64284553 0.64284553 0.64284553 
X03 0.64771764 0.64771764 0.64771764 0.64771764 
X4 0.65216585 0.65216585 0.65216585 0.65216585 
X25 0.65624320 0.65624320 0.65624320 0.65624320 
X26 0.65999423 0.65999423 0.65999423 0.65999422 
X97 0.66345660 0.66345660 0.66345660 0.66345660 
X2g 0.66666242 0.66666242 0.66666242 0.66666242 
X29 0.66963919 0.669639 19 0.66963919 0.66963919 
X30 0.67241061 0.67241061 0.67241061 0.67241060 
X31 0.67499722 0.67499722 0.67499722 0.67499721 
x32 0.67741692 0.67741692 0.67741691 0.67741691 
X33 0.67968535 0.67968535 0.67968535 0.67968535 
X34 0.68181628 0.68181628 0.68181628 0.68181628 
X35 0.68382184 0.68382184 0.68382184 0.68382184 
X36 0.68571278 0.68571278 0.68571278 0.68571278 
X37 0.68749864 0.68749864 0.68749864 0.68749864 
X3g 0.68918652 0.68918652 0.68918652 0.68918652 
X39 0.69067718 0.69067718 0.69067718 0.690677 17 
X40 0.68363346 0.68363346 0.68363346 0.68363349 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


832 Answers for Selected Exercises 


11. a. Solution Residual 
2.55613420 0.00668246 
4.09171393 —0.00533953 
4.60840390 —0.01739814 
3.64309950 —0.03171624 
5.13950533 0.01308093 
7.19697808 —0.02081095 
7.68140405 —0.04593118 
5.93227784 0.01692180 
5.81798997 0.04414047 
5.85447806 0.03319707 
5.94202521 —0.00099947 
4.42152959 —0.00072826 
3.32211695 0.02363822 
4.49411604 0.00982052 
4.80968966 0.00846967 
3.81108707 —0.01312902 


This converges in 6 iterations with tolerance 5.00 x 107? in the /,, norm and ||r® ||, = 0.046. 


b. Solution Residual 
2.55613420 0.00668246 
4.09171393 —0.00533953 
4.60840390 —0.01739814 
3.64309950 —0.03171624 
5.13950533 0.01308093 
7.19697808 —0.02081095 
7.68140405 —0.04593118 
5.93227784 0.01692180 
5.81798996 0.04414047 
5.85447805 0.03319706 
5.94202521 —0.00099947 
4.42152959 —0.00072826 
3.32211694 0.02363822 
4.49411603 0.00982052 
4.80968966 0.00846967 
3.81108707 —0.01312902 


This converges in 6 iterations with tolerance 5.00 x 107? in the /,, norm and ||r||,, = 0.046. 
c. All tolerances lead to the same convergence specifications. 


13. a. Let {v'?,...v} be a set of nonzero A-orthogonal vectors for the symmetric positive definite matrix A. Then 
(vO, Av) = 0, if i 47. Suppose 


cv) + ev? 4---+e,v” = 0, 
where not all c; are zero. Suppose k is the smallest integer for which c, 4 0. Then 
cw + cave? +e + env™ = 0. 


We solve for v to obtain 


Ck+1 C 
yO = ae yet) ny” 
Ck Ck 
Multiplying by A gives 
- Ck+1 Cc 
Ay® = Haye)... MM Ay™, 
Ck Ck 
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so 
Cr+ c Ch 
(v)'Ay® ——— (vy Ay&+D foe hoes, — (vy) Ay 
Ck Ck 
Chl 1k) get) Cn i) Ayn) 
= —— (v",Av’"’) — +++ — —(v", Ave’) 
Ck Ck 
= Ck+1 0 Ch 0 
Ck Ck ; 


Since A is positive definite, v = 0, which is a contradiction. Thus, all c; must be zero, and {v",...,v} is linearly 
independent. 


b. Let {v,...,v} be a set of nonzero A-orthogonal vectors for the symmetric positive definite matrix A, and let z be 
orthogonal to v, for each i= 1,...,n. From part (a), the set {v‘,...v} is linearly independent, so there is a 
collection of constants £;,..., 8, with 


n 
LZ= y Biv. 
i=l 


Hence, 


(z,Z) =“2= Yo iz'v® = > Bi -0=0, 
i=l i=l 


and Theorem 7.30, part (v), implies that z = 0. 


15. If A is a positive definite matrix whose eigenvalues are 0 < A, <--- <A, then ||A||2 =A, and ||A7~!||, = 1/A,, so 
Ky(A) = An/M- 
For the matrix A in Example 3 we have 


Xx 700.031 
iia aT epee: 
Ay 0.0570737 


and the matrix AH has 


ds _ 1.88052 
K,(AH ao 10a. 
aH) = dy 0.156370 


Maple gives ConditionNumber(A, 2) = 12265.15914 and ConditionNumber (AH, 2) = 12.02598124. 


Exercise Set 8.1 (Page 506) 


1. The linear least-squares polynomial is 1.70784x + 0.89968. 


3. The least-squares polynomials with their errors are, respectively, 0.6208950 + 1.219621x, with E = 2.719 x 107; 
0.5965807 + 1.253293x — 0.01085343x?, with FE = 1.801 x 107°; and 
0.6290193 + 1.185010x + 0.03533252x” — 0.01004723x3, with E = 1.741 x 107°. 
5. a. The linear least-squares polynomial is 72.0845x — 194.138, with error 329. 
. The least-squares polynomial of degree two is 6.61821x? — 1.14352x + 1.23556, with error 1.44 x 107°. 
. The least-squares polynomial of degree three is —0.0136742x? + 6.84557x? — 2.37919x + 3.42904, with error 5.27 x 107+. 


0.372382 with error 418. 


. The least-squares approximation of the form be* is 24.2588e 
. The least-squares approximation of the form bx” is 6.23903x?°™, with error 0.00703. 
. k= 0.8996, E(k) = 0.295 

. k = 0.9052, E(k) = 0.128 Part (b) fits the total experimental data best. 

9. The least squares line for the point average is 0.101 (ACT score) + 0.487. 
11. The linear least-squares polynomial gives y © 0.17952x + 8.2084. 
13. a. InR = In 1.304 + 0.5756 In W b. E = 25.25 


c. InR = In 1.051 + 0.7006 In W + 0.06695(In W)? d. E=>", (R — bWeertn we) = 20.30 


pepenane & 
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Exercise Set 8.2 (Page 518) 


1. The linear least-squares approximations are: 


a. P(x) = 1.833333 + 4x b. P(x) = —1.600003 + 3.600003x ce. Pi(x) = 1.140981 — 0.2958375x 

d. P(x) = 0.1945267 + 3.000001x e. P)(x) = 0.6109245 + 0.09167105x f. P\(x) = —1.861455 + 1.666667x 
3. The least squares approximations of degree two are: 

a. P(x) = 2.000002 + 2.99999 1x + 1.000009x? b. P2(x) = 0.4000163 — 2.400054x + 3.000028x? 

c. P2(x) = 1.723551 — 0.9313682x + 0.1588827x" d. P2(x) = 1.167179 + 0.08204442x + 1.458979x? 

e. P2(x) = 0.4880058 + 0.8291830x — 0.7375119x? f. P2(x) = —0.9089523 + 0.6275723x + 0.2597736x" 
5. a. 0.3427 x 10~° b. 0.0457142 c. 0.000358354 

d. 0.0106445 e. 0.0000134621 f. 0.0000967795 


7. The Gram-Schmidt process produces the following collections of polynomials: 
a. do(x) = Lois) =x—0.5, &(X) =x? —xF+ i, and = #3(x) = x? — 1.5x* + 0.6x — 0.05 
b d@=La@=x-1, &@)=r-2x4+3, and 30) =x -3x°+ Bx? 
c. do(x) = 1d) =x—-2, do(x) =x? — 4x4 a and = #3(x) = x° — 6x? + 11.4x — 6.8 
9. The least-squares polynomials of degree two are: 
a. P2(x) = 3.8333330(x) + 461(x) + 0.9999998¢> (x) 
b. Pox) = 2o(x) + 3.661%) + 362%) 
ec. P2(x) = 0.5493061¢0(x) — 0.2958369¢) (x) + 0.1588785¢2 (x) 
d. Po(x) = 3.194528¢9(x) + 3g) (x) + 1.458960¢2 (x) 
e. Po(x) = 0.6567600¢o (x) + 0.09167105¢; (x) — 0.73751218¢2 (x) 
f. Po(x) = 1.471878¢0(x) + 1.666667; (x) + 0.2597705¢2 (x) 
11. The Laguerre polynomials are L,(x) = x — 1, Lo(x) = x? — 4x +2 and L3(x) = x3 — 9x? + 18x — 6. 


Exercise Set 8.3 (Page 527) 


1. The interpolating polynomials of degree two are: 

a. P2(x) = 2.377443 + 1.590534(« — 0.8660254) + 0.5320418(% — 0.8660254)x 

b. P(x) = 0.7617600 + 0.8796047(x — 0.8660254) 

ec. P2(x) = 1.052926 + 0.4154370(x — 0.8660254) — 0.1384262x(x — 0.8660254) 

d. P2(x) = 0.5625 + 0.649519(x — 0.8660254) + 0.75x(x — 0.8660254) 
3. Bounds for the maximum errors of polynomials in Exercise 1 are: 

a. 0.1132617 b. 0.04166667 c. 0.08333333 d. 1.000000 
5. The zeros of T; produce the following interpolating polynomials of degree two. 

a. P2(x) = 0.3489153 — 0.1744576(x — 2.866025) + 0.1538462(x — 2.866025) (x — 2) 

b. Po(x) = 0.1547375 — 0.2461152(% — 1.866025) + 0.1957273(x — 1.866025) (x — 1) 

c. P2(x) = 0.6166200 — 0.2370869(x — 0.9330127) — 0.7427732(x — 0.9330127) (x — 0.5) 

d. P2(x) = 3.0177125 + 1.883800(x — 2.866025) + 0.2584625(x — 2.866025) (x — 2) 

1 383 5 


7. The cubic polynomial 34 — 3x approximates sinx with error at most 7.19 x 107+. 


9. The change of variable x = cos @ produces 


: T?(x) d ' [cos(n arccos x) |? d [ (n6))? d W 
tS dx = ————_————- dx = cos(n k=, 
-avV1—x? 2] V1 — x2 0 2 
11. It was shown in text (see Eq. (8.13)) that the zeros of T (x) occur at a = cos(kz/n) fork = 1,...,n — 1. Because 
Xe = cos(0) = 1, x, = cos(z) = —1, and all values of the cosine lie in the interval [—1, 1] it remains only to show that the 
zeros are distinct. This follows from the fact that for each k = 1,...,n— 1, we have e in the interval (0,77) and on this 


interval D, cos(x) = —sinx < 0. As a consequence, T (x) is one-to-one on (0,27), and these n — 1 zeros of T.(x) are distinct. 
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Exercise Set 8.4 (Page 537) 


1. The Padé approximations of degree two for f(x) = e” are: 
n=2,m=0:mo(x) = 14+ 2x4 2x7 
n=1m=1:rni@) =U+-x)/d —-x) 
n=0,m=2:r2(x) = (1 — 2x 42x’)! 


i X; f (xi) 12,9 (Xi) r11.%) To2(%i) 
1 0.2 1.4918 1.4800 1.5000 1.4706 
2 0.4 2.2255 2.1200 2.3333 1.9231 
3 0.6 3.3201 2.9200 4.0000 1.9231 
4 0.8 4.9530 3.8800 9.0000 1.4706 
5 1.0 7.3891 5.0000 undefined 1.0000 


3. ro3(x) = (1+ 2x4 x°)/( — 2x t 3x? - Fx) 


i Xj Si) 12,3 (Xi) 

1 0.2 1.22140276 1.22140277 
2 0.4 1.49182470 1.49182561 
3 0.6 1.82211880 1.82213210 
4 0.8 2.22554093 2.22563652 
5 1.0 2.71828183 2.71875000 


5. 133(x) = (w@— £x3)/(1 + £7) 


MacLaurin 
polynomial of 
I Xj Fi) degree 6 13,3(X;) 
0 0.0 0.00000000 0.00000000 0.00000000 
1 0.1 0.09983342 0.09966675 0.09938640 
2 0.2 0.19866933 0.19733600 0.19709571 
3 0.3 0.29552021 0.29102025 0.29246305 
4 0.4 0.38941834 0.37875200 0.38483660 
5 0.5 0.47942554 0.45859375 0.47357724 


7. The Padé approximations of degree five are: 

a. ros(x) = (L4+x4 $x°4+ 234 Sx 4+ Dwr)! 

b. ria) = (1 — 3x)/U + fx + 5x? + gx + Tp’) 
ce. r32(x) = (1 — dx + Ax? — E)/(1 + 2x t $2’) 
d. 


Tat (x) = ad tx a ax? ix + aox")/C aE tx) 


Answers for Selected Exercises 


i Xj Si) ro,5 (xi) ry 4 (xi) 12,3 (Xi) r41 (Xi) 

1 0.2 0.81873075 0.81873081 0.81873074 0.81873075 0.81873077 
Pi 0.4 0.67032005 0.67032276 0.6703 1942 0.67031963 0.67032099 
3 0.6 0.54881164 0.54883296 0.54880635 0.54880763 0.54882143 
4 0.8 0.44932896 0.44941181 0.44930678 0.44930966 0.44937931 
5 1.0 0.36787944 0.36809816 0.3678 1609 0.3678 1609 0.36805556 


9. 9 (x) = (1.26606679 (x) — 1.1303187; (x) + 0.2714953T>(x))/To(x) 
Pr, (2) = (0.9945705T p(x) — 0.45690467; (x)) / (T(x) + 0.480387457; (x)) 
P13 (x) = 0.7940220T (x) / (To (x) + 0.8778575T; (x) + 0.1774266T>(x)) 
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i Xj FS i) TTyo (Xi) rr, Qi) TT i) 
1 0.25 0.77880078 0.74592811 0.78595377 0.74610974 
0.50 0.60653066 0.56515935 0.61774075 0.58807059 
3 1.00 0.36787944 0.40724330 0.36319269 0.38633199 
0.91747T, 
11. rz,,(%) = 1%) 
= To(x) + 0.0889147; (x) 
i Xj Si) TT (x) 
0 0.00 0.00000000 0.00000000 
1 0.10 0.09983342 0.09093843 
2 0.20 0.19866933 0.18028797 
3 0.30 0.29552021 0.26808992 
4 0.40 0.38941834 0.35438412 
B.ae= eM in J10+s = eM In V10 ps = en lo¥ os = 107° 
b. 120 120 


t M = round(0.8685889638x), s = x — M/(0.8685889638), and 


~ (1 + s+ aS +4 s*) / (1 s+ aS u s*) with |error| < 3.75 x 107’. 
= (1+ gst ps? + ays’) /(1— 55+ ps? — aps’). Then f = (3.16227766)™ f. 


e 
c. Se 
f 


Exercise Set 8.5 (Page 546) 


Nn WwW = 


11. 


13. 


17. 


- Sy(x) = m — 4cosx + cos 2x 


3 


. 53(x) = 3.676078 — 3.676078 cos x + 1.470431 cos 2x — 0.7352156 cos 3x + 3.676078 sin x — 2.940862 sin 2x 
. Spx) = 5 + ae Ine sin kx 
. The trigonometric least-squares polynomials are: 


a. S>(x) = cos 2x 

b. So(x) = 0 

ec. $3(x) = 1.566453 + 0.5886815 cos x — 0.2700642 cos 2x + 0.2175679 cos 3x + 0.8341640 sin x — 0.3097866 sin 2x 
d. $3(x) = —2.046326 + 3.883872 cos x — 2.320482 cos 2x + 0.7310818 cos 3x 


. The trigonometric least-squares polynomial is S3(x) = —0.4968929 + 0.2391965 cos x + 1.515393 cos 2x + 


0.2391965 cos 3x — 1.150649 sin x, with error E(S3) = 7.271197. 

The trigonometric least-squares polynomials and their errors are 

a. $3(x) = —0.08676065 — 1.446416 cos 2(x — 3) — 1.617554 cos 27 (x — 3) + 3.980729 cos 3m (x — 3) — 
2.154320 sin w(x — 3) + 3.907451 sin 27 (x — 3) with E(S3) = 210.90453 

b. S3(x) = —0.0867607 — 1.446416 cos r(x — 3) — 1.617554 cos 2 (x — 3) + 3.980729 cos 32 (x — 3) — 
2.354088 cos 4x (x — 3) — 2.154320 sin w(x — 3) + 3.907451 sin 27 (x — 3) — 1.166181 sin 32 (x — 3) 
with E(S4) = 169.4943 


Let f(—x) = —f(x). The integral f 2 Ff (x) dx under the change of variable t = —x transforms to 


0 a a a 
-{ f(—t) a= f(—t) dt = -{ f(t) dt= -{ f(x) dx. 
a 0 0 0 


a 0 a a a 
/ f(x) dx at f(x) a+ [ f(x) dx = -| f@) ac [ f(x) dx =0. 
~a a 0 0 0 


Thus, 


The steps are nearly identical to those for determining the constants b, except for the additional constant term ap in the 
cosine series. In this case 


2m-1 2m—1 2m-1 n-1 
JE a ‘ 
0=— =2) fj -S,@)(-1/2) = Yo - OS (: + a, cos nx; + ) (ay cos kx; + dy snk). 
j=0 j=0 


~ 2 
dao j=0 k=l 
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The orthogonality implies that only the constant term remains in the second sum, and we have 


2m—-1 


0= > y- 5 2m) which implies that 


j=0 


Exercise Set 8.6 (Page 557) 


1. The trigonometric interpolating polynomials are: 
a. S3(x) = —12.33701 + 4.934802 cos x — 2.467401 cos 2x + 4.934802 sin x 
b. S2(x) = —6.168503 + 9.869604 cos x — 3.701102 cos 2x + 4.934802 sin x 
c. S2(x) = 1.570796 — 1.570796 cos x 
d. S3(x) = —0.5 — 0.5 cos 2x + sin x 


3. The Fast Fourier Transform Algorithm gives the following trigonometric interpolating polynomials. 


a. 


b. 
c. 


S4(x) = —11.10331 + 2.467401 cos x — 2.467401 cos 2x + 2.467401 cos 3x — 1.233701 cos 4x + 5.956833 sin x — 


2.467401 sin 2x + 1.022030 sin 3x 
S4(x) = 1.570796 — 1.340759 cos x — 0.2300378 cos 3x 


S4(x) = —0.1264264 + 0.2602724 cos x — 0.3011140 cos 2x + 1.121372 cos 3x + 0.04589648 cos 4x — 0.1022190 sin x + 
0.2754062 sin 2x — 2.052955 sin 3x 


0.1802450 sin 2x + 0.2753402 sin 3x 


Approximation Actual 
a —69.76415 —62.01255 
b. 9.869602 9.869604 
c —0.7943605 —0.2739383 
d —0.9593287 —0.9557781 


7. The b; terms are all zero. The a; terms are as follows: 


ay = —4.0008033 
a4 = —0.3030271 
ag = —0.0663172 
Q2= —0.0291807 
aig = —0.0166380 
a2 = —0.0109189 
arn = —0.0078430 
a3 = —0.0060069 
a32 = —0.0048308 
a3 = —0.0040398 
dag = —0.0034903 
da4 = —0.0031015 
agg = —0.0028256 
as. = —0.0026333 
as6 = —0.0025066 
doo = —0.0024345 


Exercise Set 9.1 (Page 568) 


1. a. The eigenvalues and associated eigenvectors are Ay = 2, v“) = (1,0,0)/; A2 = 1, v™ = (0,2, 1)‘; and 
3 = —1,v® = (—1, 1,1)’. The set is linearly independent. 


b. The eigenvalues and associated eigenvectors are A, = 2, v") = (0, 1,0); Az = 3, v = (1,0, 1)’; and 
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a, = 3.7906715 
ds = 0.1813613 
ag = 0.0520612 
a3 = 0.0249129 
a7 = 0.0148174 
a2, = 0.0099801 
a5 = 0.0072984 
x9 = 0.0056650 
433 = 0.0046040 
437 = 0.0038837 
a4, = 0.0033803 
a4s = 0.0030233 
a4g = 0.0027705 
ds3 = 0.0025960 
as57 = 0.0024837 
a, = 0.0024242 


A3 = 1,v® = (1,0, —1)'. The set is linearly independent. 


ay = —2.2230259 
do = —0.1216231 
ajo = —0.0420333 
ay= —0.0215458 
aig = —0.0132962 
ax. = —0.0091683 
a4 = —0.0068 167 
a39 = —0.0053578 
a34 = —0.0043981 
433 = —0.0037409 
a4. = —0.0032793 
d4g = —0.0029516 
sy = —0.0027203 
as4 = —0.0025626 
asg = —0.0024642 
ag2 = —0.0024169 


a= 


2m—1 


1 
a > Jie 
j=0 


« S4(x) = —0.1526819 + 0.04754278 cos x + 0.6862114 cos 2x — 1.216913 cos 3x + 1.176143 cos 4x — 0.8179387 sin x + 


a3 = 0.6258042 
a, = 0.0876136 
ay= 0.0347040 
ays = 0.0188421 
aig = 0.0120123 
a3 = 0.0084617 
ay7 = 0.0063887 
a3, = 0.0050810 
a35 = 0.0042107 
a39 = 0.0036102 
a43 = 0.0031866 
a47 = 0.0028858 
as, = 0.0026747 
ass = 0.0025328 
asy = 0.0024478 
a3 = 0.0024125 
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11. 


13. 


15. 


c. The eigenvalues and associated eigenvectors are A; = lev = (0,-1,1'3 4. =14+ J/2, yO = (/2, 1,1)‘; and 
A3 = 1— V2, v® = (—V2, 1, 1); The set is linearly independent. 
d. The eigenvalues and associated eigenvectors are Ay = A2 = 2, v) = v = (1,0,0)'; 43 = 3 with v® = (0,1, 1)’. There 
are only 2 linearly independent eigenvectors. 
. The three eigenvalues are within {A| |A| < 2} U {A| |A — 2] < 2} so p(A) <4. 
. The three eigenvalues are within {A| |A — 4| < 2} so p(A) < 6. 
. The three real eigenvalues satisfy 0 < 4 < 6 so p(A) < 6. 
. The three real eigenvalues satisfy 1.25 < 4 < 8.25 so 1.25 < p(A) < 8.25. 


cnn) 


. All the matrices except (d) have 3 linearly independent eigenvectors. The matrix in part (d) has only 2 linearly independent 


eigenvectors. One choice for P is each case is 


-1 01 0 -1 1 0 V2 -V2 
a. 12 0], Bede Bog ty e. | -1 1 W Ns 
1 1 0 0 11 1 1 1 


. The vectors are linearly dependent since —2v, + 7v2 — 3v3 = 0. 


. If cyv; +--+ +c,¥, = 0, then for any j, with 1 <7 < k, we have C1ViV1 teeet CRY; VE = 0. But orthogonality gives 


L = ro 7 >. 1 = 1 1 . oo 
civ;Vi = 0, for i Aj, so cjv;Vv; = 0 and since viv; # 0, we have c; = 0. 
Since {v;}'_, is linearly independent in R”, there exist numbers c),...,c, with 


X=CVp ++: + CnVn. 


Hence, for any k, with 1 <k <n, 
VX = CVV] Be + CaV Vn = CEV,VE = Ck. 


a. i, O=c,(1, 1)! + c>(—2, 1)! implies tat | a: I ai |-[ 0 |: Be act| 


=2 
i ; 0 = 3 £0 so by Theorem 6.7 


1 
we have cj = c) = 0. 
fi. {(1, 1)’, (—3/2,3/2)'}. 
iii, ((2/2, V2/2)', (-V2/2, V2/2)'}. 
b. i. The determinant of the matrix with these vectors as columns is —2 4 0, so {(1, 1,0)’, (1,0, 1)’, (0, 1, 1)‘} is a linearly 
independent set. 
fi. {(1, 1,0)’, 1/2, —1/2, 1)’, (—2/3, 2/3, 2/3)'} 
iii. {(/2/2, V2/2, 0)', (6/6, -V6/6, V6/3)', (—V3/3, V3/3, /3/3)'} 
e i. If O=c,(1,1,1,1)' + c2(0, 2, 2, 2)’ + c3(1,0,0, 1)’, then we have 
(Ei): ec, +03 =0, (Ex) ie, +2c2=0, (63): e, +2c2 =0, (E4) 2c, +2c2 +03 = 0. 
Subtracting (£3) from (£4) implies that c; = 0. Hence, from (£,) we have c; = 0, and from (£)) we have c) = 0. The 
vectors are linearly independent. 
ii. {(1, 1, 1, 1)’, (—3/2, 1/2, 1/2, 1/2), (0, —1/3, —1/3, 2/3)’} 
ili. {(1/2, 1/2, 1/2, 1/2)', (-V3/2, V3/6, V3/6, V3/6)', (0, -V6/6, -V6/6, V6/3)'} 
d. i. If A is the matrix whose columns are the vectors Vv), V2, V3, V4, Vs, then detA = 60 ¥ 0, so the vectors are linearly 
independent. 
ii, {(2, 2, 3, 2,3)’, (2, —1,0, —1, 0)’, (0,0, 1,0, —1)’, (1, 2, -1, 0, — 1)’, (—2/7, 3/7, 2/7, —1, 2/7)'} 
iii, ((./30/15, 30/15, /30/10, 30/15, /30/10)', (/6/3, —/6/6, 0, —V6/6, 0)', 
(0, 0, /2/2, 0, —V2/2)', (7/7, 2V7/7, —V7/7, 0, -V7/7)', (—70/35, 370/70, 170/35, —/70/10, /70/35)'} 
A strictly diagonally dominant matrix has all its diagonal elements larger in magnitude than the sum of the magnitudes of all 
the other elements in its row. As a consequence, the magnitude of the center of each GerSgorin circle exceeds in magnitude 
the radius of the circle. No circle can therefore include the origin. Hence O cannot be an eigenvalue of the matrix, and the 
matrix is nonsingular. 


Exercise Set 9.2 (Page 573) 


1. In each instance we will compare the characteristic polynomial of A, denoted C(A), to that of B, denoted C(B). They must 


agree if the matrices are to be similar. 
a. C(A) =x? —4x4+3 4x7 —2x-3=C(B). 
b. C(A) =x? —5x+6 42x? —6x+6=C(B). 
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e. C(A) = 3 — 4x02 4+ 5x -24 x3 — 4x2 + 5x —6 = C(B). 
d. C(A) = x3 — 5x? 4 12x - 11 4x3 — 4x? +404 11 = C(B). 


3. In each case we have A*> = (PDP~")(PDP~)(PDPC?) = PD>P~”, 


% 1 9 
5 5 
ml og al » | 4 | 
5 5 
i 8 0 0 
" 4 3 2 da. | 0 8 0 
* 5 5 5 0 0 8 
2 4 _6 
5 5 5 


5. They are all diagonalizable with P and D as follows. 


11. 


13. 


15. 


17. 


19. 


-1 1 5 0 
= 4 = 
a P=[ 1 | and p=| 4 > | 


1 -l 1 0 
Reel al and es eed 


fi -1 0 3 0 0 
«e P=} 0 01 and D=}] 0 1 0 
PL 1:0 00 1 
[ a: af Oo 1+ 72 0 0 
d. P= 1 1 -1 and D= 0 1-Vv2 0 
[| 1 1 1 0 0 1 


. Only the matrices in parts (a) and (c) are positive definite. 


p42 ¥ 7 2 q -# 3 0 0 
a O= ja ja and p=| c QO= 0 1 0 and D= 0 2 0 
La er _s fo # 001 


. In each case the matrix fails to have 3 linearly independent eigenvectors. 


a. det(A) = 12, so A is nonsingular. b. det(A) = —1, so A is nonsingular. 
c. det(A) = 12, so A is nonsingular. d. det(A) = 1, so A is nonsingular. 
a. The eigenvalues and associated eigenvectors are 

Ay = 5.307857563, (0.59020967, 0.51643129, 0.62044441)'; 

Ay = —0.4213112993, (0.77264234, —0.13876278, —0.61949069)'; 

A3 = —0.1365462647, (0.23382978, —0.84501102, 0.48091581)’. 


b. A is not positive definite because 42 < 0 and A3 < 0. 


Because A is similar to B and B is similar to C, there exist invertible matrices S and T with A = S~'BS and B = T~!CT. 
Hence A is similar to C because 


A= SBS = S7'(T~'!CT)S = (S7'!T~!)C(TS) = (TS)“!C(TS). 


The matrix A has an eigenvalue of multiplicity 1 at A; = 3 with eigenvector s; = (0, 1, 1)’, and an eigenvalue of multiplicity 
2 at Ay = 2 with linearly independent eigenvectors s) = (1, 1,0)’ and s3; = (—2,0, 1)’. Let S; = {81, 82,83}, S2 = {S2, 1,83}, 
and $3 = {8),83,8,}. Then A = S;'D,S, = Sy'DS> = Sy'D3S3, so A is similar to D,, D2, and D3. 


The matrix A has an eigenvalue of multiplicity 1 at A; = 3, and an eigenvalue of multiplicity 2 at A. = 2. However, A, = 2 
has only one linearly independent eigenvector, so by Theorem 9.13, A is not similar to a diagonal matrix. 


The proof of Theorem 9.13 follows by considering the form the diagonal matrix must assume. The matrix A is similar to a 
diagonal matrix D if and only if an invertible matrix S' exists with D = S~'AS, which is equivalent to AS = SD, with S 
invertible. Suppose that we have AS = SD with the columns of S denoted s,,8,...,8, and the diagonal elements of D 
denoted d),d2,...,d,. Then As; = djs; for each i = 1,2,...,n. Hence each d; is an eigenvalue of A with corresponding 
eigenvector s;. The matrix S is invertible, and consequently A is similar to D, if and only if there are n linearly independent 
eigenvectors that can be placed in the columns of S. 
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Exercise Set 9.3 (Page 590) 


1. The approximate eigenvalues and approximate eigenvectors are: 
a. 9 = 3.666667,  x® = (0.9772727, 0.9318182, 1)! 
b. «® = 2.000000, x® = (1,1,0.5)' 
ec. w® = 5.000000, x = (—0.2578947, 1, —0.2842105)! 
d. u® = 5.038462, x® = (1,0.2213741, 0.3893130, 0.4045802)' 
3. The approximate eigenvalues and approximate eigenvectors are: 
a. 9 = 1.027730, x = (—0.1889082, 1, —0.7833622)' 
b. w® = —0.4166667,  x® = (1, —0.75, —0.6666667)! 
c. w® = 17.64493, x9 = (—0.3805794, —0.09079 132, 1) 
d. uw = 1.378684, x = (—0.3690277, —0.2522880, 0.2077438, 1)! 
5. The approximate eigenvalues and approximate eigenvectors are: 
a. Ww = 3.959538, x = (0.5816124, 0.5545606, 0.5951383)! 
b. w® = 2.0000000, x = (—0.6666667, —0.6666667, —0.3333333)' 
ce. w® = 7.189567, x® = (0.5995308, 0.7367472, 0.3126762)' 
d. 2 = 6.037037, x = (0.5073714, 0.4878571, —0.6634857, —0.2536857)' 
7. The approximate eigenvalues and approximate eigenvectors are: 
a. Ay = 3.999908, x = (0.9999943, 0.9999828, 1)! 
b. Ae? = 2.414214, x49) = (1,0.7071429, 0.7070707)! 
c. Aye = 5.124749, x = (—0.2424476, 1, —0.3199733)! 
d. Aye? = 5.235861, x? = (1,0.6178361, 0.1181667, 0.4999220)' 
a. © = 1,00001523 with x® = (—0.19999391, 1, —0.79999087)! 
b. 0 = —0.41421356 with x°” = (1, —0.70709184, —0.707121720' 
c 


. The method did not converge in 25 iterations. However, convergence occurred with 44?) = 1.63663642 with 
x = (—0.57068151, 0.3633658, 1)' 


d. ju = 1.38195929 with x® = (—0.38194003, —0.23610068, 0.23601909, 1) 
11. The approximate eigenvalues and approximate eigenvectors are: 

a. w® = 4,0000000, x = (0.5773547, 0.5773282, 0.5773679)! 

b. wl?) = 2.414214, x“) = (—0.7071068, —0.5000255, —0.4999745)' 

c. 99 = 7.223663, x" = (0.6247845, 0.7204271, 0.3010466)! 

d. wu? = 7.086130, x? = (0.3325999, 0.2671862, —0.7590108, —0.4918246)' 
13. The approximate eigenvalues and approximate eigenvectors are: 

a. Ao = 1.000000, x = (—2.999908, 2.999908, 0)! 

b. Asx = 1.000000, x = (0, — 1.414214, 1.414214)! 

c. Age = 1.636734, x = (1.783218, —1.135350, —3.124733)! 

d. A.~" = 3.618177, x“ = (0.7236390, — 1.170573, 1.170675, —0.2763374)' 
15. The approximate eigenvalues and approximate eigenvectors are: 

a. «® = 4.000001, x = (0.9999773, 0.99993134, 1)! 

b. The method fails because of division by zero. 

ce. w™ = 5.124890, x"? = (—0.2425938, 1, —0.3196351)' 

d. "> = 5.236112, x“) = (1,0.6125369, 0.1217216, 0.4978318)' 
17. The approximate eigenvalues and approximate eigenvectors are: 

a. w° = 1.000000, x? = (0.1542373, —0.7715828, 0.6171474)' 

b. x“) = 1.000000, x“) = (0.00007432, —0.7070723, 0.7071413)' 

c. uw" = 4.961699, x" = (—0.4814472, 0.05180473, 0.8749428)! 

d. uu"? = 4.428007, x"? = (0.7194230, 0.4231908, 0.1153589, 0.5385466)' 
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19. a. We have |A| < 6 for all eigenvalues i. 


b. The approximate eigenvalue and approximate eigenvector are 
pw) = 0.69766854, x39) = (1, 0.7166727, 0.2568099, 0.04601217)’. 


c. The characteristic polynomial is P(A) = 44 — iA - i: and the eigenvalues are A; = 0.6976684972, 


Ag = —0.2301775942 + 0.56965884i, 43 = —0.2301775942 — 0.56965884i, and A, = —0.237313308. 

d. The beetle population should approach zero since A is convergent. 
21. Using the Inverse Power method with x = (1,0,0, 1,0, 0, 1,0,0, 1)! and g = 0 gives the following results: 

a. w) = 1.0201926, so p(A~!) © 1/u = 0.9802071; 

b. 2° = 1.0404568, so p(A7!) © 1/4 = 0.9611163; 

ce. w™) = 1,0606974, so p(A~!) © 1/p?) = 0.9427760. The method appears to be stable for all w in [4, 3]. 
23. Forming A~'B and using the Power method with x® = (1,0,0, 1,0,0,1,0,0, 1)’ gives the following results: 

a. The spectral radius is approximately u“° = 0.9800021. 

b. The spectral radius is approximately 4°) = 0.9603543. 

c. The spectral radius is approximately '®) = 0.9410754. 


Exercise Set 9.4 (Page 600) 


1. Householder’s method produces the following tridiagonal matrices. 


12.00000 —10.77033 0.0 2.0000000 1.414214 0.0 
a. | —10.77033 3.862069 5.344828 b. | 1.414214 1.000000 0.0 
0.0 5.344828 7.137931 0.0 0.0 3.0 
1.0000000 —1.414214 0.0 4.750000 —2.263846 0.0 
ec | —1.414214 1.000000 0.0 d. | —2.263846 4.475610 —1.219512 
0.0 0.0 1.000000 0.0 —1.219512 5.024390 
3. Householder’s method produces the following tridiagonal matrices. 
[ 2.0000000 2.8284271 1.4142136 
a. | —2.8284271 1.0000000 2.0000000 
|. 0.0000000 2.0000000 = 3.0000000 
| —1.0000000 —3.0655513 — 0.0000000 
b. | —3.6055513 —0.23076923 3.1538462 
|. 0.0000000 0.15384615 2.2307692 
| 5.0000000 4.9497475 —1.4320780 —1.5649769 
«= 1.4142136 —2.0000000 —2.4855515 1.8226448 
° 0.0000000 —5.4313902 —1.4237288 —2.6486542 
| 0.0000000 0.0000000 1.5939865 5.4237288 
| 4.0000000 1.7320508 — 0.0000000 0.0000000 
d 1.7320508 2.3333333 —0.23570226 0.40824829 
* | 0.0000000 —0.47140452 4.6666667 —0.57735027 
|_9.0000000 0.0000000 —0.0000000 5.0000000 


Exercise Set 9.5 (Page 611) 


1. Two iterations of the QR Algorithm produce the following matrices. 


3.142857 
—0.559397 
0.0 —0.187848 


4.549020 
1.206958 3.519688 


a. A®) 


b. A® = 
0.0 
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—0.559397 0.0 
2.248447 


1.206958 0.0 
0.000725 


0.000725 —0.068708 


—0.187848 
0.608696 
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4.592920 —0.472934 0.0 


c. A® = | —0.472934 3.108760 —0.232083 
0.0 —0.232083 1.298319 
3.071429 0.855352 0.0 0.0 
d. A® = 0.855352 3.314192 —1.161046 0.0 
° ~ 0.0 —1.161046 3.331770 0.268898 
0.0 0.0 0.268898 0.282609 
—3.607843 0.612882 0.0 0.0 
eAt= 0.612882 —1.395227 —1.111027 0.0 
: _ 0.0 0.346353 3.133919 0.346353 
0.0 0.0 0.346353 0.869151 
1.013260 0.279065 0.0 0.0 
£ A® = 0.279065 0.696255 0.107448 0.0 


0.0 0.107448 0.843061 0.310832 
0.0 0.0 0.310832 0.347424 


3. The matrices in Exercise 1 have the following eigenvalues, accurate to within 10~>. 


a. 3.414214, 2.000000, 0.58578644 b. —0.06870782, 5.346462, 2.722246 
ce. 1.267949, 4.732051, 3.000000 d. 4.745281, 3.177283, 1.822717, 0.2547188 
e. 3.438803, 0.8275517, — 1.488068, —3.778287 f. 0.9948440, 1.189091, 0.5238224, 0.1922421 


5. The matrices in Exercise | have the following eigenvectors, accurate to within 10~>. 
a. (—0.7071067, 1, —0.7071067)', (1,0, —1)’, (0.7071068, 1, 0.7071068)' 
b. (0.1741299, —0.5343539, 1)’, (0.4261735, 1, 0.4601443)’, (1, —0.2777544, —0.3225491)' 
c. (0.2679492, 0.7320508, 1)’, (1, —0.7320508, 0.2679492)’, (1, 1,—1)’ 


d. (—0.08029447, —0.3007254, 0.7452812, 1)’, (0.4592880, 1, —0.7179949, 0.8727118)', 
(0.87271 18, 0.7179949, 1, —0.4592880)' (1, —0.7452812, —0.3007254, 0.08029447)' 


e. (—0.01289861, —0.07015299, 0.4388026, 1)’, (—0.1018060, —0.2878618, 1, —0.4603102)’, 
(1, 0.5119322, 0.2259932, —0.05035423)' (—0.5623391, 1,0.2159474, —0.03185871)' 


f. (—0.1520150, —0.3008950, —0.05 155956, 1)’, (0.3627966, 1, 0.7459807, 0.3945081)’, 
(1, 0.09528962, —0.6907921, 0.1450703)', (0.8029403, —0.9884448, 1, —0.1237995)' 


7. a. Let 
pe cos 6 —sind 
sin 0 cos 0 


and y = Px. Show that ||x||2 = |lyll2. Use the relationship x, + ix, = re’, where r = ||x||2 and a = tan7!(x)/x,), and 
Miya, 


b. Let x = (1,0)! and 6 = 77/4. 
9. Let C = RQ, where R is upper triangular and Q is upper Hessenberg. Then c;; = )-y_, rgsj. Since R is an upper triangular 
matrix, ry, = 0 if k <i. Thus c;; = )°)_; rieqij- Since Q is an upper Hessenberg matrix, gg = 0 if k > j + 1. Thus, 


Cij = yak TikQj- The sum will be zero if i > j + 1. Hence, c;; = 0 if i> j +2. This means that C is an upper Hessenberg 
matrix. 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Answers for Selected Exercises 


11. INPUT: dimension n, matrix A = (a;;), tolerance TOL, maximum number of iterations N. 
OUTPUT: eigenvalues 41,...,4, of A or a message that the number of iterations was exceeded. 


Step 1 Set FLAG = 1; kl =1. 
Step 2 While (FLAG = 1) do Steps 3 - 10 
Step 3 Fori=2,...,n do Steps 4 — 8. 
Step 4 For j = 1,...,i— 1 do Steps 5 - 8. 
Step 5 If a; =a; then set 
CO = 0.5V2; 
SI=CO 
else set 
b= |ai — als 
c = 2aj; sign(aj — aj); 


CO=05(1+b/(2+0)*)'; 
SI = 0.5¢/ (CO(c? +2°)?). 


Step 6 Fork =1,...,n 
if (k €i) and (k #/) then 
set X= aj;3 
Y= ais 
aj = CO -x+ SI -y; 
ay = CO -y+ SI -x; 
x= Gjks 
Y= Giz; 
aye = CO -x+ SI -y; 
aik = CO y= SI -x. 
Step 7 Set x = aj;; 
Y= Gis 
ajj = CO- CO -x+2- SI-CO-aj;+ SI - SI -y; 
a;,; = SI - SI -x — 2. SI- CO -a;;+ CO-CO-y. 
Step 8 Set aij = 0; Qj = 0. 
Step 9 Set 


s= sj = laijl- 
j#i 


Step 10 Ifs< TOL thenfori=1,...,n_ set 


Ai = Gis 
OUTPUT (Aj,...,An); 
set FLAG = 0. 


else setkl =k1+1; 
if kl > N then set FLAG = 0. 
Step 11 If kl > N then 
OUTPUT (’Maximum number of iterations exceeded’); 
STOP. 


13. a. To within 10~, the eigenvalues are 2.618034, 3.618034, 1.381966, and 0.3819660. 
b. In terms of p and p¢ the eigenvalues are —65.45085p/p, —90.45085p/p, —34.54915p/p, and —9.549150p/p. 


15. The actual eigenvalues are as follows: 


a. When a = 1/4 we have 0.97974649, 0.92062677, 0.82743037, 0.70770751, 0.57115742, 0.42884258, 0.29229249, 


0.17256963, 0.07937323, and 0.02025351. 


b. When aw = 1/2 we have 0.95949297, 0.84125353, 0.65486073, 0.41541501, 0.14231484, —0.14231484, —0.41541501, 


—0.65486073, —0.84125353, and —0.95949297. 


843 


c. When a = 3/4 we have 0.93923946, 0.76188030, 0.48229110, 0.12312252, —0.28652774, —0.71347226, —1.12312252, 


—1.48229110, —1.76188030, and —1.93923946. The method appears to be stable for a < i 
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Exercise Set 9.6 (Page 625) 


1. as) = 14-72, =-14 V2 b. 55 = V6, 5 =1 
ce 5) = VI, 5 = V6 d. 5) = V7, 5 =1,53;=1 
3. a. 
y —| 70.923880 —0.382683 5 — [2414214 0 yr — | ~0-923880 —0.382683 
~ | —0.3826831 0.923880] ~ 0 0.414214|’ © ~ | 0.382683 —0.923880 
b. 
—0.912871 0 —0.408248 2.449490 0 
U = | —0.365148 —0.447214 0.816497], S= 0 1 Mea prea | 
—0.182574 0.894427 0.408248 0 0 , ‘ 
Cc. 
—0.632456 -—0.5 —0.522293 —0.277867 3.162278 0 
ties 0.316228 -—0.5 —0.301969 0.747539 fie 0 2.0 
~ | —0.316228 -0.5 0.797047. 0.121309 |’ 0 (nn i 
—0.632456 0.5 —0.027215 0.590982 0 0 
,_[-10 0.0 
. =| 0.0 —1.0 
d. 
—0.436436 0.707107 0.408248 —0.377964 2.645751 0 0 
_ 0.436436 0.707107 —0.408248 0.377964 i 0 1 0 
~ | —0.436436 0 -0.816497 —0.377964|° ~~ 0 0 1’ 
—0.654654 0 0 0.755929 0 0 0 
—0.577350 —0.577350 0.577350 
V= 0 0.707107 0.707107 
0.816497 —0.408248 0.408248 
5. For the matrix A in Example 2 we have 
101 
10001 0 1 0 211 
AA=]0 111 41// 01 1+4=/ 14 1 
1010 0 0 1 0 1 1 2 
1 1 0 


So A’ACL, 2, 1)’ = (5, 10,5)’ = 5(1, 2, 1)’, ATACL, —1, 1)’ = (2, —2, 2)’ = 2(1, -1, 1)’, and A’A(—1,0, 1)’ = (-1,0, 1)’. 
7. Let A be an m x n matrix. Theorem 9.25 implies that Rank(A) = Rank(A‘), so Nullity(A) = n — Rank(A) and 
Nullity(A’) = m — Rank(A’) = m — Rank(A). Hence Nullity(A) = Nullity(A’) if and only if 1 = m. 
9. Rank(S) is the number of nonzero entries on the diagonal of S. This corresponds to the number of nonzero eigenvalues 
(counting multiplicities) of A'A. So Rank(S) = Rank(A’A), and by part (ii) of Theorem 9.26 this is the same as Rank(A). 
11. Because both U~! = U' and V~! = V' exist, A = USV' implies that A~! = (USV')~! = VS~!U' if and only if S~! exists. 
13. Yes. By Theorem 9.25 we have Rank(A'A) = Rank((A’A)’) = Rank(AA‘). Applying part (iii) of Theorem 9.26 gives 
Rank(AA‘) = Rank(A‘A) = Rank(A). 
15. If the n x n matrix A has the singular values 5; > 5s. >--- > 5, > 0, then ||A||2 = Jp (ATA) = s,. In addition, the singular 


values of A“! are + >---> +> 4+ >0,50||A™!||l2 =,/+ = +. Hence K2(A) = ||Allo- JAW 2 = 51/sp. 
V on Sn 


sn — to 
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17. a. 
1 1 7.691213 
1 2 0 
A= 1) 3) ie OS 0 
1 4 0 
1 5 0 
and 
0.160007 0.757890 
0.285308 0.467546 
U= | 0.410609 0.177202 
0.535909 —0.113142 
0.661210 —0.403486 
This produces P(x) = 0.33 + 1.29x. 
b. 
1 1 1 
12 4 
A= 13 9), S= 
1 4 16 
1 5 25 
—0.055273 
V'=| —0.602286 
0.796364 
and 
—0.038954 —0.527903 
—0.136702 —0.589038 
U=j —0.294961 —0.457453 
—0.513732 —0.133148 
—0.793015 0.383877 


This produces P(x) = 0.18 + 1.418571x — 0.0214286x’. 


Exercise Set 10.1 (Page 636) 


1. Use Theorem 10.5. 


. Use Theorem 10.5 for each of the partial derivatives. 


0 
0.919370 


0 , 
0 
0 


—0.414912 
0.067225 
0.837705 

—0.217438 

—0.272580 


32.15633 
0 


0 
0 
0 


—0.224442 
—0.769677 
—0.597681 


0.778148 
—0.075997 
—0.435258 
—0.299632 

0.330878 


Answers for Selected Exercises 


Va | 0.266934 0.963715 
0.963715 —0.266934 
—0.362646 —0.310381 
0.399603 0.731982 
—0.201287 —0.240279 
0.654348 —0.473867 
—0.490018 0.292544 
0 0 
2.197733 0 
0 0.374376 |, 
0 0 
0 0 
—0.972919 
0.211773 
0.092637 
—0.008907 —0.337944 
0.243571 0.754483 
—0.677268 —0.235783 
0.659453 —0.440105 
—0.216849 0.259350 


b. With x = (0,0)' and tolerance 10-°, we have x") = (0.9999973, 0.9999973)'. 

c. With x = (0,0)! and tolerance 10-°, we have x“! = (0.9999984, 0.9999991)'. 
7. a. With x = (1, 1,1)’, we have x© = (5.0000000, 0.0000000, —0.5235988)'. 

b. With x = (1,1, 1)’, we have x® = (1.0364011, 1.0857072, 0.93119113)'. 

c. With x = (0,0, 0.5)’, we have x® = (0.00000000, 0.09999999, 1.0000000)’. 

d. With x = (0,0,0)', we have x® = (0.49814471, —0.19960600, —0.52882595)’. 
9. a. With x = (1,1, 1)’, we have x® = (0.5000000, 0, —0.5235988)’. 

b. With x = (1,1, 1‘, we have x“ = (1.036400, 1.085707, 0.9311914)'. 

c. With x© = (0,0,0)', we have x® = (0,0.1000000, 1.0000000)'. 

d. With x© = (0,0,0)’, we have x = (0.4981447, —0.1996059, —0.5288260)’. 

11. A stable solution occurs when x; = 8000 and x, = 4000. 
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13. In this situation we have, for any matrix norm, 


I|F(x) — F(Xo)|| = [Ax — Axol| = ||A@& — xo) < |IATI - [1X — xoll. 


The result follows by selecting 6 = ¢/||A||, provided that ||A|| 4 0. When ||A|| = 0, 6 can be arbitrarily chosen, because A is 
the zero matrix. 


Exercise Set 10.2 (Page 644) 


1. a. x® = (0.4958936, 1.983423)! b. x® = (—0.5131616, —0.01837622)' 
c. x? = (—23.942626, 7.6086797)' d. x) cannot be computed since J(0) is singular. 

3. a. (0.5, 0.2)! and (1.1,6.1)' b. (—0.35, 0.05)’, (0.2, —0.45)', (0.4, —0.5)' and (1, —0.3)' 
ce. (—1,3.5)', (2.5, 4)! d. (0.11,0.27)' 

5. a. With x = (0.5,2)',x® = (0.5,2)' With x = (1.1,6.1),x® = (1.0967197, 6.0409329)' 
b. With x = (—0.35,0.05)',x® = (—0.37369822, 0.056266490'. 


13. 


With x© = (0.2, —0.45)',x = (0.14783924, —0.43617762)'. 
With x® = (0.4, —0.5)',x™ = (0.40809566, —0.49262939)'. 
With x© = (1, —0.3)',x = (1.0330715, —0.27996184)' 
ce. With x© = (—1,3.5)!, x = (—1,3.5)! and x® = (2.5, 4)', x = (2.546947, 3.984998)’. 
d. With x© = (0.11,0.27)', x® = (0.1212419, 0.2711051)!. 
a. x = (0.5000000, 0.8660254)' b. x© = (1.772454, 1.772454)! 
ce. x = (—1.456043, — 1.664230, 0.4224934)' d. x® = (0.4981447, —0.1996059, —0.5288260)' 


. With x = (1,1 — 1)! and TOL = 10~°, we have x” = (0.5,9.5 x 1077, —0.5235988)’. 
11. 


When the dimension n is 1, F(x) is a one-component function f(x) = f;(x), and the vector x has only one component 


x; =x. In this case, the Jacobian matrix J(x) reduces to the 1 x 1 matrix [Ho] = f’(x) = f’(x). Thus the vector equation 


x = xk) = J(x®-D)- R(X YD) 


becomes the scalar equation 


7 Xx-1) 
Xe = Xe — f Oe)! f Cet) = e-1 — it uae 
St’ OxK-1) 
With o = 1, for each i= 1,2,...,20, the following results are obtained. 
i 1 2 3 4 5 6 
a 0.14062 0.19954 0.24522 0.28413 0.31878 0.35045 
i 7 8 9 10 11 12 13 


a 0.37990 0.40763 0.43398 0.45920 0.48348 0.50697 0.52980 


i 14 15 16 17 18 19 20 


a 0.55205 0.57382 0.59516 0.61615 0.63683 0.65726 0.67746 


Exercise Set 10.3 (Page 652) 


1. a. x® = (0.4777920, 1.927557)! b. x® = (—0.3250070, —0.1386967)' 
ce. x® = (0.52293721, 0.82434906)! d. x® = (1.77949990, 1.74339606)! 

3. a. x® = (0.5, 2). b. x = (—0.3736982, 0.05626649)'. 
ce. x = (0.5, 0.8660254)! d. x® = (1.772454, 1.772454)! 
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5. a. With x = (2.5, 4)’, we have x = (2.546947, 3.984998)’. 
b. With x© = (0.11,0.27)', we have x = (0.1212419, 0.2711052)'. 
c. With x = (1,1, 1)', we have x® = (1.036401, 1.085707, 0.9311914)’. 
d. With x© = (1,—1, 1)’, we have x® = (0.9, —1,0.5)'; and with x = (1,1,—1)', we have x® = (0.5, 1, —0.5)’. 

7. With x = (1,1 — 1)‘, we have x©® = (0.5000591, 0.01057235, —0.5224818)'. 

9. Let 4 be an eigenvalue of M = (I + uv’) with eigenvector x 4 0. Then Ax = Mx = (I + uv’) x=x+ (v'x) u. Thus, 
QA -—1)x= (v'x) u. If A = 1, then v’x = 0. So 4 = 1 is an eigenvalue of M with multiplicity n — 1 and eigenvectors 
x, ...,x"") where v'x) = 0, for j = 1,...,2—1. Assuming 4 4 1 implies x and u are parallel. Suppose x = au. Then 
(A — lau = (v'(au)) u. Thus, a(A — lhu=a@ (v'u) u, which implies that 4 — 1 = v'u or A = 1+ v'u. Hence, M has 
eigenvalues 4;, 1 <i <n where A; = 1, fori=1,...,n—1 and dA, = 1+ vu. Since detM = Tj i;, we have 
detM =1+4 v'u. 

11. With x© = (0.75, 1.25)’, we have x = (0.7501948, 1.184712)’. Thus, a = 0.7501948, b = 1.184712, and the error is 
19.796. 


Exercise Set 10.4 (Page 659) 


1. a. With x® = (0,0)', we have x" = (0.4943541, 1.948040)’. 
b. With x = (1, 1)', we have x = (0.4970073, 0.8644143) . 
c. With x = (2,2)', we have x) = (1.736083, 1.804428)’. 
d. With x = (0,0), we have x™ = (—0.3610092, 0.05788368) . 
3. a. x® = (0.5,2)' b. x® = (0.5, 0.8660254)' 
ce. x = (1.772454, 1.772454) d. x® = (—0.3736982, 0.05626649)' 
5. a. x9) = (1.036400, 1.085707, 0.9311914)! b. x® = (0.5, 1,-0.5) 
ce. x® = (—1.456043, — 1.664230, 0.4224934)' d. x = (0.0000000, 0.10000001, 1.0000000)' 


Exercise Set 10.5 (Page 666) 


1. a. (3, —2.25)' b. (0.42105263, 2.6184211)' ce. (2.173110, —1.3627731)! 
3. Using x(0) = 0 in all parts gives: 
a. (0.44006047, 1.8279835)' b. (—0.41342613, 0.096669468)' 
c. (0.49858909, 0.24999091, —0.52067978)' d. (6.1935484, 18.532258, —21.725806)' 
5. a. With x0) = (—1, 3.5)’ the result is (—1, 3.5)’. With x(0) = (2.5, 4)’ the result is (—1, 3.5)’. 
b. With x(0) = (0.11, 0.27)! the result is (0.12124195, 0.27110516)’. 
c. With x(0) = (1,1, 1)’ the result is (1.03640047, 1.08570655, 0.93119144)’. 
d. With x(0) = (1,—1, 1)’ the result is (0.90016074, —1.00238008, 0.496610937)'. With x(0) = (1, 1, —1)’ the result is 
(0.50104035, 1.00238008, —0.49661093)'. 
7. a. With x(0) = (—1,3.5)! the result is (—1,3.5)’. With x(O) = (2.5, 4)’ the result is (2.5469465, 3.9849975)’. 


. With x(0) = (1, 1, 1)! the result is (1.03640047, 1.08570655, 0.93119144)’. 
. With x(0) = (1, —1, 1)’ the result is (0.90015964, —1.00021826, 0.49968944)’. 
With x(0) = (1,1, —1)’ the result is (0.5009653, 1.00021826, —0.49968944)’. 
9. (0.50024553, 0.078230039, —0.52156996)' 


11. For each A, we have 


a 
b. With x(0) = (0.11,0.27)' the result is (0.12124191, 0.27110516)’. 
c 
d 


0 = GA, x(A)) = F(KQ)) — e F(x), 


so 
_ AF (x(A)) dx 


0 
ox dir 


+ e“F(x(0)) = J(x(A))x’ (A) + oF (x(0)) 
and 


J(x(A))x'(A) = —e F(x(0)) = —F(x(0)). 
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Thus 


x’(A) = —J(x(A)) 1 F(x(0)). 


With N = 1, we have h = 1 so that 


x(1) = x(0) — J(x(0))'F(x(0)). 


However, Newton’s method gives 


x) <= x _ T(x) F(X), 


Since x(0) = x, we have x(1) = x. 
Exercise Set 11.1 (Page 677) 


1. The Linear Shooting Algorithm gives the results in the following tables. 


a. i Xi Wii yi) b. i Xj Wii y(xi) 
1 0.5 0.82432432 0.82402714 1 0.25 0.3937095 0.3936767 
2 0.50 0.8240948 0.8240271 
3 0.75 1.337160 1.337086 
3. The Linear Shooting Algorithm gives the results in the following tables. 
a. i Xj Wii yi) b. i Xi Wii y(xi) 
3 0.3 0.7833204 0.7831923 >) 1,25 0.1676179 0.1676243 
6 0.6 0.6023521 0.6022801 10 1.50 0.4581901 0.4581935 
9 0.9 0.8568906 0.8568760 15 175 0.6077718 0.6077740 
ci Xj Wii yi) d. i Xi Wii yi) 
B) 0.3 —0.5185754 —0.5185728 | 1.3 0.0655336 0.06553420 
6 0.6 —0.2195271 —0.2195247 6 1.6 0.0774590 0.07745947 
9 0.9 —0.0406577 —0.0406570 9 1.9 0.0305619 0.03056208 


5. The Linear Shooting Algorithm with h = 0.05 gives the following results. 


i Xj Wii 

6 0.3 0.04990547 
10 0.5 0.00673795 
16 0.8 0.00033755 


The Linear Shooting Algorithm with h = 0.1 gives the following results. 


i X; Wii 
3 0.3 0.05273437 
P) 0.5 0.00741571 
8 0.8 0.00038976 
7. a. The approximate potential is u(3) © 36.66702 using h = 0.1. 
b. The actual potential is u(3) = 36.66667. 
9. a. There are no solutions if b is an integer multiple of 7 and B #0. 
b. A unique solution exists whenever b is not an integer multiple of z. 


c. There is an infinite number of solutions if b is an multiple integer of 7 and B = 0. 
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Exercise Set 11.2 (Page 684) 


1. The Nonlinear Shooting Algorithm gives w; = 0.405505 * In 1.5 = 0.405465. 
3. The Nonlinear Shooting Algorithm gives the results in the following tables. 


a. i X; Wii yx) Wi 
2 1.20000000 0.18232094 0.18232156 0.83333370 
4 1.40000000 0.33647129 0.33647224 0.71428547 
6 1.60000000 0.47000243 0.47000363 0.62499939 
8 1.80000000 0.58778522 0.58778666 0.55555468 
Convergence in 4 iterations t = 1.0000017. 

b. i X; Wii yx) Wo; 
2 0.31415927 1.36209813 1.36208552 1.29545926 
4 0.62831853 1.80002060 1.79999746 1.45626846 
6 0.94247780 2.24572329 2.24569937 1.32001776 
8 1.25663706 2.58845757 2.58844295 0.79988757 
Convergence in 4 iterations tf = 1.0000301. 

Ce i Xi Wij y(Xxj) W; 
1 0.83775804 0.86205941 0.86205848 0.38811718 
2 0.89011792 0.88156057 0.88155882 0.35695076 
3 0.94247780 0.89945618 0.89945372 0.32675844 
4 0.99483767 0.91579268 0.91578959 0.29737141 
Convergence in 3 iterations t = 0.42046725. 

d. i Xj Wii yi) Wi 

4 0.62831853 2.58784539 2.58778525 0.80908243 
8 1.25663706 2.95114591 2.95105652 0.30904693 

12 1.88495559 2.95115520 2.95105652 —0.30901625 
16 2.51327412 2.58787536 2.58778525 —0.80904433 


Convergence in 6 iterations t = 1.0001253. 


5. a. Modify Algorithm 11.2 as follows: 
Set h = (b—a)/N; 


Step 1 


Step 2 


Step 3 


Step 6 
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k=2; 
TK1 = (6 —a)/(b— a). 
Set wip =a; 


W290 = TK. 
For i= 1,...,N do Steps 4 and 5. 


Step 4 
Step 5 


Ky = hwi-13 


Setx=a+(i- Ih. 
Set 


kia =hf (x, wii-1, W2i-1)5 
Ko = h(wai-1 + ki 2/2); 


Kog =hf(xt+h/2, wyi-t + ki /2, W2i-1 + k12/2); 


k31 = A(wj-1 + ko2/2); 


kz =hf(xt+h/2, wii-i + ko1/2, W2i-1 + ky2/2); 


Ka = h(wai-1 + k32/2); 
kao =hf(xt+h/2, wii + kaa, w2i-1 + k32)s 
Wi = Wir + (Ki + 2koy + 231 + ky) /6; 
Wo; = Wr ji-1 + (Ki2 + 2ko2 + 2k32 + k42)/6. 
Set TK2 = TK1 + (B — wiy)/(b— a). 


Answers for Selected Exercises 
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Step 7 While (k < M) do Steps 8-15. 

Step 8 Set w29 = TK2; 
HOLD = wyy. 

Step 9 Fori=1,...,N do Steps 10 and 11. 
Step 10 (Same as Step 4) 
Step 11 (Same as Step 5) 

Step 12 If |w1, — B| < TOL then do Steps 13 and 14. 
Step 13. Fori=0,...,N setx =a+ih; 

OUTPUT(x, Wi» Wj). 

Step 14 STOP. 


Step 15 Set 
TK = TK2 — (wy — B)(TK2 — TK1)/(wi.y — HOLD); 
TK1 = TK2; 
TK2 = TK; 
k=k+1. 
Step 16 OUTPUT(’ Maximum number of iterations exceeded.’ ); 
STOP. 


b. (3a) 3 iterations: 


~. 


Xj Wi yi) 


1:2 0.45453896 0.45454545 
1.4 0.41665348 0.41666667 
1.6 0.38459538 0.38461538 


1.8 0.35711592 0.35714286 


BwNne 


(3c) 3 iterations: 


i Xj Wi y@i) 

1 262, 1.24299575 1.24300281 
2 2.4 1.29211897 1.29213540 
3 2.6 1.34009800 1.34012683 
4 2.8 1.38671706 1.38676227 


Exercise Set 11.3 (Page 689) 


1. The Linear Finite-Difference Algorithm gives following results. 


ai Xj Wii y(xXj) 
1 0.5 0.83333333 0.82402714 
b. i Xj Wi y(xXj) 
1 0.25 0.395 12472 0.39367669 
2 0.5 0.82653061 0.82402714 
3 0.75 1.33956916 1.33708613 
4(0.82653061) — 0. 
. (0.8265306 , 0.83333333 = 0.82426304 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Answers for Selected Exercises 


3. The Linear Finite-Difference Algorithm gives the results in the following tables. 


a. i Xj Wi yx) b. 
2 0.2 1.018096 1.0221404 
5 0.5 0.5942743 0.59713617 
7 0.7 0.65 14520 0.65290384 

c 1 Xj Wii y(xXj) d. 
3 0.3 —0.5183084 —0.5185728 
6 0.6 —0.2192657 —0.2195247 
9 0.9 —0.0405748 —0.04065697 


i Xj Wi y%) 

5 1.25 0.16797186 0.16762427 
10 1.50 0.45842388 0.45819349 
15 1.75 0.60787334 0.60777401 
i Xj Wii yi) 

3 1.3 0.0654387 0.0655342 
6 1.6 0.0773936 0.0774595 
9 1.9 0.0305465 0.0305621 


5. The Linear Finite-Difference Algorithm gives the results in the following tables. 


i 


3 
6 
9 


x; w(h = 0.1) 


0.3 0.05572807 
0.6 0.00310518 
0.9 0.00016516 


I Xj 
6 0.3 
12 0.6 
18 0.9 


w;(h = 0.05) 


0.05132396 
0.00263406 
0.00013340 


7. a. The approximate deflections are shown in the following table. 


i x; Wii 
5 30 0.0102808 
10 60 0.0144277 
15 90 0.0102808 
b. Yes. 


c. Yes. Maximum deflection occurs at x = 60. The exact solution is within tolerance, but the approximation is not. 


Exercise Set 11.4 (Page 696) 


1. The Nonlinear Finite-Difference Algorithm gives the following results. 


i 


Xj 


Wi 


y%) 


1 


1.5 0.4067967 


0.405465 1 


3. The Nonlinear Finite-Difference Algorithm gives the results in the following tables. 


ory a wi yu) 
2  1.20000000 0.18220299 = 0..18232156 
4  1.40000000 0.33632929 0.33647224 
6 1.60000000 0.46988413 0.47000363 
8  1.80000000 0.58771808 0.58778666 
Convergence in 3 iterations 

Cc 1 Xj Wi yx) 
1 0.83775804  0.86205907 —0.86205848 
2 0.89011792 0.88155964 —0.88155882 
3. 0.94247780 0.89945447  0.89945372 
4  0.99483767 0.91579005 —_-0.91578959 


Convergence in 2 iterations 
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b. 


i Xj Wj yQi) 
2 0.31415927 = 1.36244080 ~—-11.36208552 
4 0.62831853 = 1.80138559 —-1.79999746 
6 0.94247780 2.24819259 = 2.24569937 
8 = 1.25663706 2.59083695 =. 2.58844295 
Convergence in 3 iterations 

i x; W; yQ@i) 

4  0.62831853 = 2.58932301 = 2.58778525 
8 = 1.25663706 =2.95378037 =. 2.95105652 
12. 1.88495559 =—-2.95378037 ~—-2.95105652 
16 = 2.51327412 =2.58932301 = 2.58778525 


Convergence in 4 iterations 
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5. b. For (4a) 


Xj w(h = 0.2) w(h = 0.1) wj(h = 0.05) EXT, ; EXT); EXT3, 
1.2 0.45458862 0.45455753 0.45454935 0.45454717 0.45454662 0.45454659 
1.4 0.41672067 0.41668202 0.41667179 0.41666914 0.41666838 0.41666833 
1.6 0.38466137 0.38462855 0.38461984 0.38461761 0.38461694 0.38461689 
1.8 0.35716943 0.35715045 0.35714542 0.35714412 0.35714374 0.35714372 
For (4c) 
X; w(h = 0.2) w(h = 0.1) w(h = 0.05) EXT, ; EXT); EXT3,; 
1.2 2.0340273 2.0335158 2.0333796 2.0333453 2.0333342 2.0333334 
1.4 2.1148732 2.1144386 2.1143243 2.1142937 2.1142863 2.1142858 
1.6 2.2253630 2.2250937 2.2250236 2.2250039 2.2250003 2.2250000 
1.8 2.3557284 2.3556001 2.3555668 2.3555573 2.3555556 2.3355556 


7. The Jacobian matrix J = (a;,;) is tridiagonal with entries given in (11.21). So 
a, =2+ hf, (x. Ww, is = «)) ; 
2h 
h 1 
a.=-1+ ao (xm, 5p = «)) ; 
GSS 1.= as (xan, = ie - wi) » for2<i<N-1 
2 2h 
aij =2+’ f, (sim 55 (oi = wn), for2<i<N-1 
h 1 : 
Gist =—1+ fy (sim, 5p wie! aa 7) , for2<i<N-I1 
ayn-1 =—1-— ae (xa 6 = un») ; 
2 2h 


1 
ayn =2+ hf; (x. WN, an = uv) . 


Thus, |a;;| > 2+ h’8, for i= 1,...,N. Since |fy@.y.y)| <L and h < 2/L, 


h ; AL 
zh yy) < > <i. 
So 
a) aac (uw —a)) | <2 
lai2| =} — res 1, Wis oF Ww2—-—a <2 < lal, 


laie—al + [anita] = — Giz-1 — Git 


h 1 h 1 
=1+ afr (x. Wi, 5p wie! = n)) se fy (x. Wi, 5p wie = wi) 


=2< lal, 
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and 


h 1 
lan.v-1| = —an.w-1 = 1+ zh (xa. an = uy-») <2 < lawl. 


By Theorem 6.31, the matrix J is nonsingular. 


Exercise Set 11.5 (Page 710) 
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1. The Piecewise Linear Algorithm gives ¢(x) = —0.07713274¢ (x) — 0.07442678¢2(x). The actual values are 
y(x;) = —0.07988545 and yx) = —0.07712903. 


3. The Piecewise Linear Algorithm gives the results in the following tables. 


a i Xj (xi) yi) 
3 0.3 —0.212333 —0.21 
6 0.6 —0.241333 —0.24 
9 0.9 —0.090333 —0.09 
Cc oT Xj (xi) yi) 
5 0.25 —0.3585989 —0.3585641 
10 0.50 —0.5348383 —0.5347803 
15 0.75 —0.4510165 —0.4509614 


5. The Cubic Spline Algorithm gives the results in the following tables. 


ai Xi b(xi) y(xi) 
3 0.3 —0.2100000 —0.21 
6 0.6 —0.2400000 —0.24 
9 0.9 —0.0900000 —0.09 
c i Xj o (xi) yi) 
Bi) 0.25 —0.3585639 —0.3585641 
10 0.50 —0.5347779 —0.5347803 
15 0.75 —0.4509109 —0.4509614 
7. i op (xi) y%) 
3 1.0408182 1.0408182 
6 1.1065307 1.1065306 
9 1.3065697 1.3065697 


b. i Xj (xi) yi) 
3 0.3 0.1815138 0.1814273 
6 0.6 0.1805502 0.1804753 
9 0.9 0.05936468 0.05934303 
d. i Xj (xi) yi) 
5 0.25 —0.1846134 —0.1845204 
10 0.50 —0.2737099 —0.2735857 
15 0.75 —0.2285169 —0.2284204 
b. i xj (xi) yi) 
3 0.3 0.1814269 0.1814273 
6 0.6 0.1804753 0.1804754 
9 0.9 0.05934321 0.05934303 
da. i Xj (xi) yi) 
5 0.25 —0.1845191 —0.1845204 
10 050 02735833 —0.2735857 
15 0.75 —0.2284186 —0.2284204 


9. A change in variable w = (x—a)/(b— a) gives the boundary value problem 


where 0 < w < 1, y(0) =a, and y(1) = f. Then Exercise 6 can be used. 


13. For ¢ = (co, ¢),.. 


nui)! and $(x) = Diy cidi(x), we have 


1 
c'Ac = / PO’ COP + ao @)P de. 
0 


d 
Fy (PhO — a)yw + a)y’) + (b— a)’q((b — aw + ay = (b— a) f ((b- aw +a), 


But p(x) > 0 and g(x)[¢(x)]* = 0, so e’Ac > 0, and it can be 0, for x 4 0, only if ¢’(x) = 0 on [0, 1]. However, 


{$09}. -- 


f 


. 144} is linearly independent, so $'(x) 4 0 on [0,1] and c’Ac = 0 if and only if e = 0. 
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Exercise Set 12.1 (Page 723) 


1. The Poisson Equation Finite-Difference Algorithm gives the following results. 


i j Xi yj Wij u(Xi, Yi) 
1 1 0.5 0.5 0.0 0 

1 2 0.5 1.0 0.25 0.25 

1 3 0.5 1.5 1.0 1 


3. The Poisson Equation Finite-Difference Algorithm gives the following results. 


a. 30 iterations required: 


1 J Xi yj Wij u(x, yj) 
2 2 0.4 0.4 0.1599988 0.16 
2 4 0.4 0.8 0.3199988 0.32 
4 2 0.8 0.4 0.3199995 0.32 
4 4 0.8 0.8 0.6399996 0.64 


b. 29 iterations required: 


i J Xi yj Wij u(x, Yj) 

2 1 1.256637 0.3141593 0.2951855 0.2938926 
ie) 3 1.256637 0.9424778 0.1830822 0.1816356 
4 1 2.513274 0.3141593 —0.7721948 —0.7694209 
4 3 2.513274 0.9424778 —0.4785169 —0.4755283 


c. 126 iterations required: 


1 J Xj dj Wij u(Xi, Yj) 

4 3 0.8 0.3 1.2714468 1.2712492 
4 7 0.8 0.7 1.7509414 1.7506725 
8 3 1.6 0.3 1.6167917 1.6160744 
8 7 1.6 0.7 3.0659184 3.0648542 


d. 127 iterations required: 


i j Xj yj Wij U(X, Yj) 

2 2 1.2 1.2 0.5251533 0.5250861 
4 4 1.4 1.4 1.3190830 1.3189712 
6 6 1.6 1.6 2.4065 150 2.4064186 
8 8 1.8 1.8 3.8088995 3.8088576 


7. The approximate potential at some typical points are as follows. 


i J Xj Jj Wij 
1 4 0.1 0.4 88 
2 1 0.2 0.1 66 
4 2 0.4 0.2 66 
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Exercise Set 12.2 (Page 736) 


1. The Heat Equation Backward-Difference Algorithm gives the following results. 


a. i J Xj tj Wij u(x; t) 
1 1 0.5 0.05 0.632952 0.652037 
2 1 1.0 0.05 0.895129 0.883937 
3 1 1.5 0.05 0.632952 0.625037 
1 2 0.5 0.1 0.566574 0.552493 
2 2, 1.0 0.1 0.801256 0.781344 
3 2 1.5 0.1 0.566574 0.552493 


3. The Crank-Nicolson Algorithm gives the following results. 


i J Xi fj Wij U(X, tj) 

1 1 0.5 0.05 0.628848 0.652037 
2 1 1.0 0.05 0.889326 0.883937 
3 1 1.5 0.05 0.628848 0.625037 
1 2 0.5 0.1 0.559251 0.552493 
2 2 1.0 0.1 0.790901 0.781344 
3 2 1.5 0.1 0.559252 0.552493 


5. The Forward-Difference Algorithm gives the following results. 
a. For h= 0.4 and k = 0.1: 


I J Xj t Wij u(X;, t)) 
2 5 0.8 0.5 3.035630 0 
3 5 1.2 0.5 —3.035630 0 
4 5 1.6 0.5 1.876122 0 


For h = 0.4 and k = 0.05: 


I J Xj i Wij uU(X;, t)) 
2 10 0.8 0.5 0 0 
3 10 1.2 0.5 0 0 
4 10 1.6 0.5 0 0 


b. For h = 4 and k = 0.05: 


10 
i J Xj tj Wij u(x; tj) 
3 10 0.94247780 0.5 0.4864832 0.4906936 
6 10 1.88495559 0.5 0.5718943 0.5768449 
9 10 2.82743339 0.5 0.1858197 0.1874283 


7. a. For h=0.4 and k = 0.1: 


i J Xj tj Wij u(x;, tj) 
2 5 0.8 0.5 —0.00258 0 
3 ) 1.2 0.5 0.00258 0 
4 3 1.6 0.5 —0.00159 0 
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For h = 0.4 and k = 0.05: 


i J Xi t Wij u(X;, t;) 
2 10 0.8 0.5 —4.93 x 10-4 0 
3 10 1.2 0.5 4.93 x 10-4 0 
4 10 1.6 0.5 —3.05 x 10-4 0 
b. For h = 7 and k = 0.05: 
i j Xi tj Wij u(xi, tj) 
3 10 0.94247780 0.5 0.4986092 0.4906936 
6 10 1.88495559 0.5 0.5861503 0.5768449 
9 10 2.82743339 0.5 0.1904518 0.1874283 


9. The Crank-Nicolson Algorithm gives the following results. 
a. For h = 0.4 and k = 0.1: 


i J Xi tj Wij u(Xi, t)) 
2 5 0.8 0.5 8.2 x 1077 0 
3 5 1.2 0.5 —8.2 x 1077 0 
4 5 1.6 0.5 5.1 x 1077 0 


For h = 0.4 and k = 0.05: 


i J Xj t Wij u(X;, t) 
2 10 0.8 0.5 —2.6 x 10~® 0 
3 10 1.3 0.5 2.6 x 107° 0 
4 10 1.6 0.5 —1.6 x 10~° 0 


b. For h = 4 and k = 0.05: 


10 


i J Xj tj Wij u(X;, tj) 


3 10 0.94247780 0.5 0.4926589 0.4906936 
6 10 1.88495559 0.5 0.5791553 0.5768449 
9 10 2.82743339 0.5 0.1881790 0.1874283 


11. a. Using h = 0.4 and k = 0.1 leads to meaningless results. Using h = 0.4 and k = 0.05 again gives meaningless answers. 
Letting h = 0.4 and k = 0.005 produces the following: 


i J Xi tj Wij 

1 100 0.4 0.5 —165.405 

2 100 0.8 0.5 267.613 

3 100 1.2 0.5 —267.613 

4 100 1.6 0.5 165.405 

b. i J Xj tj w (xij) 

3 10 0.94247780 0.5 0.46783396 
6 10 1.8849556 0.5 0.54995267 
9 10 2.8274334 0.5 0.17871220 
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13. We have 
@ @ . in . Qri 
a\\V, + a)2v,° = (1 — 2A) sin — + A sin — 
m m 
and 
2... 458 . ee : P 
(i) . i . in . it _ it in 
iv; = |1—4A{ sin — sin — =} 1 — 4A | sin — 2 sin —— cos —— 
2m m 2m 2m 2m 
_ in im _ it im 
= 2sin cos 8A ( sin cos 
2m m 2m 2m 
However, 
. i . 2ni . in i . i i 
(1 — 2A) sin — + Asin =2(1 — 2A) sin cos + 22 sin — cos — 
m m 2m 2m m m 
a ASE i 
=2(1 — 2A) sin cos 
2m 2m 
. i ia . iW 2 
+2] 2 sin cos 1—2{ sin 
2m 2m 2m 
: : : . 43 
_ it im ix | . ix 
=2 sin cos 8A. cos sin ; 
2m 2m 2m 2m 
Thus 
ayve + av? = pv, 
Further 
@ @ (i . iG — In . Ya . it ijn 
Gjj-1Uj_y + GjU; Fj p41 Yj4) =A sin a + (1 — 2d) sin 7 a ra 
. Uw itv . It Uj . YT 
=A{ sin cos sin — cos + (1 — 22) sin 
m m m m m 
( _ yn in . in =) 
+A sin cos + sin cos 
m m m m 
[j70 [JIC [JTC (TC 
=sin oe 2 sin = + 2a sin ™ cos he 
m m m m 
. YT . Ur im 
= sin — + 2A sin —j| cos — — 1 
m m m 
and 
a 2 oe + oe 
(i) _ in . yn 1 1 im . Ya 
piv, = }]1—4A[ sin sin =/1-4. cos sin 
7 2m m 2 2 m m 
it ijt 
= ]1+2A{cos——1 sin —, 
m m 
so 
Aj AYjy + gyi; + aig dy? = Had}? 
Similarly, 


(i) (i) (i) 
Am—2,m—1 Vp) + Am—1,m-1 Uy, = LiVp»_1> 


so AV = pv. 
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15. To modify Algorithm 12.2, change the following: 
Step 7 Set 
t = jk; 
Z) = (w, + kF(h))/h. 


Step 8 Fori=2,...,m—1 set 
2 = (wi + kF (ih) + Az-1)/li- 


To modify Algorithm 12.3, change the following: 
Step 7 Set 


t = jk; 
Xr 
a= la — Aw + Sw, +4Fa| /t:- 
Step 8 Fori=2,...,m—1 set 
xr 
y= la —))wi+ 3 (Wig + Wi + 2-1) + Kran] fi 


17. To modify Algorithm 12.2, change the following: 


Step 7 Set 
t = jk; 
wo = O(); 
Z = (wi + Awo)/l1. 
Wn = w(t). 


Step 8 Fori=2,...,m—2 set 
= (wi + Az-1)/lis 
Set 
Zm—1 = (Wnt + AWm + AZm—2)/En—1- 
Step 11. OUTPUT (1); 
For i=0,...,m set x = ih; 
OUTPUT (x, w;). 


To modify Algorithm 12.3, change the following: 


Step 7 Set 
h=I/m; 
k=T/N; 
A= ark/h’; 
Wm = W(0); 
wo = (0). 
Step 7 Set 
t = jk; 
z= [Cl —-A)wi + kw. + a0 + <p(t)] /h; 
wo = P(t). 


Step 8 Fori=2,...,m—2 set 
z= [CL —A)w; + F (wig + wir + 2-1) /his 
Set 
Zm-1 = [a <7 A)Wm—1 + (Wm ob Wm-2 + Zm—2 + v@)| /Ln—15 
Wn = W(t). 
Step 11. OUTPUT (1): 
For i = 0,...,m set x = ih; 


OUTPUT (x, w;). 
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19. a. The approximate temperature at some typical points is given in the table. 


l J ri ij Wij 

1 20 0.6 10 137.6753 
2 20 0.7 10 245.9678 
3 20 0.8 10 340.2862 
4 20 0.9 10 424.1537 


The strain is approximately J = 1242.537. 


Exercise Set 12.3 (Page 744) 


1. The Wave Equation Finite-Difference Algorithm gives the following results. 


i J Xi tj Wij uU(X;, t)) 
2 4 0.25 1.0 —0.7071068 —0.7071068 
3 4 0.50 1.0 —1.0000000 —1.0000000 
4 4 0.75 1.0 —0.7071068 —0.7071068 
3. The Wave Equation Finite-Difference Algorithm with h = 7; and k = 0.05 gives the following results. 
i J Xj tj Wij u(X;, tj) 
2 10 5 0.5 0.5163933 0.5158301 
5 10 2 0.5 0.8785407 0.8775826 
8 10 ‘n 0.5 0.5163933 0.5158301 
The Wave Equation Finite-Difference Algorithm with h = 5, and k = 0.1 gives the following results. 
I J Xi fj Wij 
4 5 - 0.5 0.5159163 
10 P| 5 0.5 0.8777292 
16 5 ‘n 0.5 0.5159163 
The Wave Equation Finite-Difference Algorithm with h = 5, and k = 0.05 gives the following results. 
L J Xj tj Wij 
4 10 5 0.5 0.5159602 
10 10 7 0.5 0.8778039 
16 10 4 0.5 0.5159602 


5. The Wave Equation Finite-Difference Algorithm gives the following results. 


I J Xj t Wij u(x;, t) 

2 3 0.2 0.3 0.6729902 0.61061587 
2) 3 0.5 0.3 0 0 

8 3 0.8 0.3 —0.6729902 —0.61061587 


7. a. The air pressure for the open pipe is p(0.5,0.5) ~ 0.9 and p(0.5, 1.0) © 2.7. 
b. The air pressure for the closed pipe is p(0.5,0.5) ~ 0.9 and p(0.5, 1.0) © 0.9187927. 
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Exercise Set 12.4 (Page 758) 


1. With £E, = (0.25,0.75), Ey = (0,1), £3 = (0.5,0.5), and EZ, = (0,0.5), the basis functions are 


4x on T; 
x,y) = 
oon ie on T», 
—-1-—2x+2y onT, 
x,y) = 
fae f on 7), 
0 on T; 
x,y) = 
PGy) eee on To, 
ices 2—2x—2y onT, 
x,y) = 
ae 2—2x—2y onTo, 


and y, = 0.323825, y. = 0, v3 = 1.0000, and y, = 0. 
3. The Finite-Element Algorithm with K = 8,N = 8,M = 32,n = 9,m = 25, and NL = 0 gives the following results, where the 
labeling is as shown in the diagram. 


10 


114——12——13 
10) \a | NZ TN 
9\] 23 | 24\ | 25 
1 2—3 ——16 
26} \27\4] N15 
14\| 1\| 3\|27 
4——5 —_6 
28 | \6 1 \8 [NW 
16\ | 5\]7\} 29 


7 8 9 
30 | \31 | \32. | \2 
18\ | 19 | 20\,| 21 
23 


D2: 24 25 


&) 
17 
9 
1 


2 


1, = 0.511023 
» = 0.720476 
4, = 0.507899 
4 = 0.720476 
5 = 1.01885 

6 = 0.720476 
4 = 0.507896 
, = 0.720476 
» = 0.511023 


y=0 10<i<25 
u(0.125, 0.125) © 0.614187 
u(0.125, 0.25) © 0.690343 
u(0.25, 0.125) © 0.690343 
u(0.25, 0.25) © 0.720476 
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5. The Finite-Element Algorithm with K = 0,N = 12,M = 32,n = 20,m = 27, and NL = 14 gives the following results, where 
the labeling is as shown in the diagram. 


21 2 
Au 
T, Tx 


2, 2 
Tog 


3 2 
ie 
To 


4 2, 
T30 


5 2 
Ts 


6 27 
Typ 


\ 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


u(1,0) © 22.92824 
u(4, 0) % 22.84663 


2 2. 


(34 


& 18.85895 


8 1 2 3 4 5 6 7 9 
Ty3 Ti Tis Ti6 Ti Tis Tio Tro 
vA Ts; T6 T, Ts WA Ve Ja ya Tx 
10 —11 12 13 14—15 16—17 18 19 
1 =21.40335 yg = 24.19855 15 = 20.23334 yp = 15 
y2 = 19.87372 yo = 24.16799 yo = 20.50056  yy3 = 15 
5 = 19.10019 49 = 27.55237 yy = 21.35070 4 = 15 
4 = 18.85895 4, = 25.11508 yg = 22.84663 yrs = 15 
5 = 19.08533 42 = 22.92824 49 = 24.98178 yg = 15 
6 =19.84115 43 = 21.39741 9 = 27.41907 yoy = 15 
4 = 2134694 44 = 2052179 yy) = 15 
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A-stable, 351 
A-orthogonal, 481 
Absolute 
deviation, 499 
error, 20 
stability, region of, 351 
Accelerating convergence, 86 
Accuracy, degree of, 197 
Adams Fourth-Order Predictor-Corrector 
algorithm, 311 
Adams Variable Step-Size 
Predictor-Corrector algorithm, 317 
Adams, John Couch, 303 
Adams-Bashforth methods 
definition, 303, 307 
stability of, 346 
Adams-Moulton methods 
definition, 303, 308 
stability of, 346 
Adaptive quadrature 
error estimate, 224 
algorithm, 226 
method, 223 
Aitken’s A? method, 87, 579, 581, 585 
Aitken, Alexander, 87 
al-Khwararizm?t, Muhammad ibn-M&aA, 32 
Algebraic polynomial, 91, 106 
Algorithm 
Adams Fourth-Order 
Predictor-Corrector, 311 
Adams Variable Step-Size 
Predictor-Corrector, 317 
Adaptive Quadrature, 226 
Bézier Curve, 169 
Bisection, 48 
Broyden’s, 650 
cautious Romberg, 220 
Chebyshev Rational Approximation, 
535 
Cholesky’s, 418 
Clamped Cubic Spline, 155 
Composite Simpson’s, 207 
conditionally stable, 34 
Crank-Nicolson, 734 
Crout Factorization for Tridiagonal 


Linear Systems, 422 

Cubic Spline Rayleigh-Ritz, 707 

description, 32 

Euclidean norm, 41 

Euler’s, 267 

Extrapolation, 323 

Fast Fourier Transform, 553 

Finite-Element, 753 

Fixed Point Iteration, 60 

Gauss-Seidel Iterative, 456 

Gaussian Double Integral, 246 

Gaussian Elimination with Backward 
Substitution, 364 

Gaussian Elimination with Partial 
Pivoting, 374 

Gaussian Elimination with Scaled 
Partial Pivoting, 376 

Gaussian Triple Integral, 248 

general-purpose, 41 

Heat Equation Backward-Difference, 
730 

Hermite Interpolation, 141 

Horner’s, 94 

Householder, 598 

Inverse Power Method, 584 

Iterative Refinement, 474 

Jacobi Iterative, 453 

LDL' Factorization, 417 

Linear Finite-Difference, 687 

Linear Shooting, 674 

LU Factorization, 405 

Method of False Position, 73 

Miiller’s, 97 

Natural Cubic Spline, 149 

Neville’s Iterated Interpolation, 122 

Newton’s Divided-Difference, 126 

Newton’s Method, 67 

Newton’s Method for Systems, 641 

Newton-Raphson, 67 

Nonlinear Finite-Difference, 692 

Nonlinear Shooting, 681 

Padé Rational Approximation, 531 

Piecewise Linear Rayleigh-Ritz, 702 

Poisson Equation Finite-Difference, 720 

Power Method, 578 


QR, 608 
Romberg, 219 
Runge-Kutta Method for Systems of 
Differential Equations, 330 
Runge-Kutta Order Four, 288 
Runge-Kutta-Fehlberg, 297 
Secant, 71 
Simpson’s Double Integral, 245 
SOR, 466 
special-purpose, 41 
stable, 34 
Steepest descent, 658 
Steffensen’s, 88 
Symmetric power method, 581 
Trapezoidal with Newton Iteration, 
352 
unstable, 34 
Wave Equation Finite-Difference, 742 
Wielandt Deflation, 588 
Annihilation technique, 591 
Annuity due equation, 77 
Approximating z, 192 
Approximation theory, 497 
Archimedes, 185, 192 
Asymptotic error constant, 79 
Augmented matrix, 360 
Average value of a function, 10 


B-splines, 705 
Bézier Curve algorithm, 169 
Bézier polynomial, 169 
Bézier, Pierre Etienne, 169 
Backward difference 
formula, 130, 174 
method, 729 
notation, 130 
Backward error analysis, 476 
Backward Euler method, 355 
Backward substitution 
Gaussian elimination, 361 
Backward-substitution, 359, 362 
Band 
matrix, 421 
width, 421 
Basis for R", 564 
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Basis functions 
B-spline, 705 
piecewise bilinear, 748 
piecewise linear, 699, 748 
Beam deflection problem, 671, 690, 696 
Beetle population problem, 450 
Bell shaped spline, 705 
Bernoulli equation, 301 
Bernoulli, Daniel, 529, 538 
Bernstein polynomial, 117, 170 
Bessel function, 118 
Bilinear basis functions, 748 
Binary 
digit, 18 
representation of a number, 18 
search method, 48 
Bisection algorithm, 48 
Bisection method 
as a starting procedure, 50 
description, 48 
rate of convergence, 51 
stopping procedure, 49 
Bit, 18 
BLAS, 44 
Boundary-value problem 
B-splines, 705 
centered difference formula, 685 
Collocation method, 710 
Cubic Spline Rayleigh-Ritz algorithm, 
707 
definition, 672 
extrapolation, 688, 694 
finite-difference method, 684, 691 
Galerkin method, 709 
linear, 673, 684 
Linear Finite-Difference algorithm, 687 
linear shooting algorithm, 674 
linear shooting method, 674 
nonlinear, 678, 691 
Nonlinear Finite-Difference algorithm, 
692 
Nonlinear Shooting algorithm, 681 
nonlinear shooting method, 678 
Piecewise Linear Rayleigh-Ritz 
algorithm, 702 
Rayleigh-Ritz method, 696 
reverse shooting technique, 677 
two-point, 672 
Brent’s method, 102 
Bridge truss, 431, 462, 468 
Briggs, Henry, 174 
Brouwer, L. E. J., 56 
Broyden’s algorithm, 650 
Broyden’s method, 648 
Bulirsch-Stoer extrapolation, 327 
Bunyakovsky, Viktor Yakovlevich, 434 


C, 40 
Car on a race track problem, 213 
Cauchy’s method, 102 
Cauchy, Augustin-Louis, 3, 261, 434 
Cauchy-Bunyakovsky-Schwarz inequality, 
434, 442 
Cautious Romberg algorithm, 220 
Cautious Romberg method, 259 
Center of mass of a lamina problem, 252 
Center of mass problem, 249 
Centered difference formula, 132, 685, 
732 
Characteristic, 18 
Characteristic polynomial, 344, 350, 443 
Characteristic value (see also eigenvalue), 
443 
Characteristic vector (see also 
eigenvector), 443 
Chebyshev polynomial 
definition, 518 
economization, 526 
extrema, 521 
monic, 521 
zeros, 521 
Chebyshev Rational Approximation 
algorithm, 535 
Chebyshev, Pafnuty Lvovich, 519 
Chemical reaction problem, 293 
Cholesky algorithm, 418 
Cholesky’s method, 405 
Cholesky, Andre-Louis, 418 
Chopping arithmetic, 20 
in Maple, 31 
Circular cylinder problem, 101 
Clamped boundary, 146, 705 
Clamped Cubic Spline algorithm, 155 
Clavius, Christopher, 533 
Closed method (see implicit method), 303 
Closed Newton-Cotes formulas, 200 
Coaxial cable problem, 724 
Cofactor of a matrix, 396 
College GPA-ACT problem, 508 
Collocation method, 710 
Column vector, 360 
Complete pivoting, 379 
Complex conjugate, 96 
Complex zeros (roots), 96 
Composite midpoint rule, 209 
Composite numerical integration, 204 
Composite Simpson’s algorithm, 207 
Composite Simpson’s rule, 207 
double integrals, 245 
Composite trapezoidal rule, 208 
Computer 
arithmetic, 18 
graphics, 166, 169 


software, 40 
Condition number 

approximating, 471 

definition, 470 
Conditionally stable, 729 
Conditionally stable algorithm, 34 
Conformist problem, 276 
Conjugate direction method, 484 
Conjugate gradient method, 479 
Consistent 

multistep method, 343 

one-step method, 339 
Contagious disease problems, 301 
Continuation method, 668 
Continued-fraction, 533 
Continuity 

related to convergence, 3 

related to derivatives, 4 
Continuous function 

from R to R, 3 

from R” to R, 632 

from R” to R", 632 
Continuous least squares, 539 
Contraction Mapping Theorem, 632 
Convergence 

accelerating, 86 

cubic, 86 

linear, 79 

of vectors, 436 

order of, 37, 79 

quadratic, 79 

rate of, 37 

related to continuity, 3 

superlinear, 91, 648 
Convergent 

matrix, 448 

multistep method, 343 

one-step method, 339 

sequence, 3 

vectors, 432 
Convex set, 261 
Cooley and Tukey algorithm, 548 
Coordinate function, 630 
Corrugated roofing problem, 173, 214 
Cotes, Roger, 198 
Cramer’s rule, 400 

operation counts, 400 
Crank, John, 733 
Crank-Nicolson algorithm, 734 
Crank-Nicolson method, 733 
Crash-survivability problem, 508 
Crout factorization, 722, 730 
Crout Factorization for Tridiagonal Linear 

Systems algorithm, 422 

Crout’s method, 405, 421, 721, 730, 734 
Cubic convergence, 86 


Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). 


Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 


Cubic Hermite interpolation, 144, 166, 
280 
Cubic Hermite polynomial, 144, 280 
piecewise, 166 
Cubic spline 
algorithms, 149, 155 
error-bound, 160 
interpolant, 146 
interpolation, 145, 705 
Cubic Spline Rayleigh-Ritz algorithm, 
707 
Cylinder temperature in, 738 


d’ Alembert, Jean, 92, 538 
Data compression, 624 
de Boor, Carl, 705 
Decimal machine number, 20 
Decomposition, singular value, 614 
Deflation, 95, 586 
Degree of accuracy, of a quadrature 
formula, 197 
Degree of precision, of a quadrature 
formula, 197 
Derivative 
approximation, 174 
definition, 3 
directional, 655 
relative to continuity, 4 
Determinant of a matrix, 396 
operation counts, 399 
Diagonal matrix, 386 
Diagonalization, 571 
Diagonally dominant matrix, 412 
Difference 
backward, 130 
equation, 267 
forward, 88, 129 
Differentiable function, 3 
Differential equation 
approximating, 260, 673 
boundary-value (see boundary-value 
problems), 672 
higher order, 328 
initial-value (see initial-value 
problems), 260 
perturbed, 263 
stiff, 348 
system, 328 
well posed, 263 
Diffusion equation, 715 
Direct Factorization of a matrix, 400 
Direct methods, 357 
Directional derivative, 655 
Dirichlet boundary conditions, 714 
Dirichlet, Johann Peter Gustav Lejeune, 
714 


Discrete least squares, 498, 541 
Disk brake problem, 214 
Distance between matrices, 438 
Distance between vectors, 435 
Distribution of heat 

steady state, 713 
Divided difference, 125 

kth, 125 

first, 125 

related to derivative, 139 
Doolittle’s method, 405, 421 
Double integral, 237 
Drug concentration problem, 77 


Economization of power series, 526 
Eigenvalue 
approximating, 562 
definition, 443 
Eigenvector 
definition, 443 
linear independence, 567 
orthonormal, 572 
EISPACK, 44, 627 
Electrical circuit problems, 184, 275, 321, 
331, 357 
Electrical transmission problem, 745 
Electrostatic potential problem, 678 
Elliptic partial differential equation, 713, 
716 
Energy of moth problem, 509 
Equal matrices, 381 
Equations, normal, 698 
Erf, 16, 116, 222 
Error 
absolute, 20 
control, 293, 315 
exponential growth, 34 
function, 16, 116, 222 
global, 339 
in computer arithmetic, 18 
linear growth, 34 
local, 277 
local truncation, 276, 306, 340, 342 
relative, 20 
round-off, 18, 20, 180, 184 
truncation, 11 
Escape velocity problem, 258 
Euclidean norm (see also 1, norm), 41, 433 
Euler’s algorithm, 267 
Euler’s constant, 40 
Euler’s method, 266 
definition, 266 
error bound, 271, 273 
Euler’s modified method, 286 
Euler, Leonhard, 266, 538 
Explicit method, 200, 302 
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Exponential error growth, 34 
Exponential least squares, 504 
Extended midpoint rule (see also 
composite midpoint rule), 209 
Extended Simpson’s rule (see also 
composite Simpson’s rule), 207 
Extended trapezoidal rule (see also 
composite trapezoidal rule), 208 
Extrapolation 
Bulirsch-Stoer, 327 
derivatives, 185 
Gragg, 321 
initial-value problem, 321 
integration, 215 
linear boundary-value problem, 688 
midpoint method, 321 
nonlinear boundary-value problem, 694 
Richardson’s, 185, 688, 694 
Extrapolation algorithm, 323 
Extreme Value Theorem, 5 


Factorization of a matrix, 400 
False position, method of, 73 
Fast Fourier Transform algorithm, 553 
Fast Fourier transform method, 548 
operation counts, 550 
Fehlberg, Erwin, 296 
Fibonacci 
problem, 101 
sequence, 40 
Fibonacci (Leonardo of Pisa), 101 
Finite-difference method, 717 
linear, 684 
nonlinear, 691 
Finite-digit arithmetic, 22 
Finite-Element algorithm, 753 
Finite-element method, 746 
First divided difference, 125 
Five-point formula, 178 
Fixed point 
definition, 56, 633 
iteration, 60 
Fixed Point Iteration algorithm, 60 
Fixed Point Theorem, 62, 633 
Floating-point form, 20 
Flow of heat in a rod, 714 
Food supply problem, 371 
FORTRAN, 40 
Forward difference 
formula, 129, 174 
method, 726 
notation, 88, 129 
Fourier series, 539 
Fourier, Jean Baptiste Joseph, 538, 539 
Fourth-order Adams-Bashforth, 303 
Fourth-order Adams-Moulton, 303 
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Fraction, continued, 533 
Fredholm integral equation, 371 
Free boundary, 146, 705 
Fresnel integrals, 230 
Frobenius norm of a matrix, 442 
Fruit fly problem, 428, 575 
Function 

average value, 10 

Bessel, 118 

continuous, 3, 632 

coordinate, 630 

differentiable, 3 

differentiable on a set, 3 

error, 16, 116, 222 

from R to R, 3 

from R” to R, 632 

from R” to R", 632 

limit, 2, 632 

normal density, 213 

orthogonal, 515 

orthonormal, 515 

rational, 528 

signum, 54 

weight, 514 
Functional iteration, 60 
Fundamental Theorem of Algebra, 91 


Galerkin method, 709 
Galerkin, Boris Grigorievich, 709 
GAUSS, 45 
Gauss, Carl Friedrich, 92 
Gauss-Jordan method, 370 
operation counts, 370 
Gauss-Seidel iteration, 719 
Gauss-Seidel Iterative algorithm, 456 
Gauss-Seidel iterative method, 454 
Gauss-Seidel method for nonlinear 
systems, 636 
Gaussian Double Integral algorithm, 246 
Gaussian Elimination 
backward substitution, 362 
description, 361 
operation count, 366 
with Partial Pivoting, 374 
with Scaled Partial Pivoting, 375 
Gaussian Elimination with Backward 
Substitution algorithm, 364 
Gaussian Elimination with Partial Pivoting 
algorithm, 374 
Gaussian Elimination with Scaled Partial 
Pivoting algorithm, 376 
Gaussian quadrature 
for double integrals, 243 
for single integrals, 230 
for triple integrals, 248 
Gaussian transformation matrix, 402 


Gaussian Triple Integral algorithm, 248 
Gaussian-Kronrod method, 259 
General purpose software, 41 
Generalized Rolle’s Theorem, 8 
GerSgorin Circle Theorem, 562 
GerSgorin, Semyon Aranovich, 562 
Girard, Albert, 92 
Givens, James Wallace, 602 
Global error, 339 

related to local truncation error, 340, 

343 

Golden ratio, 40 
Golub, Gene, 614 
Gompertz population growth, 78 
Gradient, 655 
Gragg extrapolation, 321 
Gram, Jorgen Pedersen, 515 
Gram-Schmidt process, 515, 567 
Graphics, computer, 166, 169 
Gravity flow discharge problem, 646 
Great Barrier Reef problem, 508 
Grid lines, 716 
Growth of error 

exponential, 34 

linear, 34 
Guidepoint, 167 


Harmonic series, 40 
Harriot, Thomas, 174 
Heat distribution, 718 
steady state, 713 
Heat equation, 713 
Heat Equation Backward-Difference 
algorithm, 730 
Heat flow in a rod, 714, 738 
Heine, Heinrich Eduard, 3 
Hermite Interpolation algorithm, 141 
Hermite piecewise cubic polynomial, 144, 
166, 280 
Hermite polynomial, 136 
divided difference form, 139 
error formula, 137 
Hermite, Charles, 136 
Hestenes, Magnus, 479 
Heun, Karl, 287 
Higher derivative approximation, 179 
Higher order differential equation, 328 
Higher order initial-value problem, 328 
Hilbert matrix, 478, 512 
Hilbert, David, 512 
History problem, 276 
Homework-final grades problem, 507 
Homotopy method, 668 
Hompack, 669 
Hooke’s law, 497, 507 
Horner’s algorithm, 94 


Horner’s method, 92 

Horner, William, 92 

Hotelling deflation, 591 

Householder method, 593 

Householder transformation, 593 

Householder’s algorithm, 598 

Householder, Alston, 593 

Hugyens, Christiaan, 185 

Hyperbolic partial differential equation, 
715, 739 


Ideal gas law, 1, 32 
Identity matrix, 386 
IEEE Arithmetic Standard, 18 
Ill-conditioned matrix, 471 
IML++, 495 
Implicit method, 201, 303 
Implicit trapezoidal method, 351 
Improper integral, 253 
IMSL, 45, 171, 259, 356, 430, 558, 
712, 760 
Induced matrix norm, 438 
Initial-value problem 
A-stable method, 351 
Adams Predictor-Corrector algorithm, 
310 
Adams Variable step-Size 
Predictor-Corrector algorithm, 317 
Adams-Bashforth method, 303, 307 
Adams-Moulton method, 303, 308 
adaptive methods, 294 
backward Euler method, 355 
Bernoulli equation, 301 
characteristic polynomial, 344, 350 
consistent method, 339, 343 
convergent method, 339, 343 
definition, 260 
error control, 293, 315 
Euler’s algorithm, 267 
Euler’s method, 266 
existence, 262 
extrapolation, 321 
Extrapolation algorithm, 323 
higher order, 328 
Implicit trapezoidal method, 351 
local truncation error, 276, 306, 342 
m-step multistep method, 302 
midpoint method, 286, 321 
Milne’s method, 313 
Milne-Simpson method, 314 
modified Euler method, 286 
multistep method, 302 
perturbed, 263 
predictor-corrector method, 310 
region of absolute stability, 351 
root condition, 345 
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Runge-Kutta order four, 288 
Runge-Kutta Order Four algorithm, 
288 


Runge-Kutta-Fehlberg algorithm, 297 


Simpson’s method, 313 
stable method, 340 
stiff equation, 348 
Strong stability, 345 
Taylor method, 276 
Trapezoidal Method algorithm, 352 
uniqueness, 262 
unstability, 345 
weak stability, 345 
well-posed problem, 263 
Inner product, 479 
Integral 
improper, 253 
multiple, 237 
Riemann, 9 
Integration 
composite, 204 
Midpoint rule, 201 
Simpson’s rule, 196, 200 
Simpson’s three-eighths rule, 200 
trapezoidal rule, 194, 200 
Intermediate Value Theorem, 8 
Interpolation, 108 
cubic Hermite, 280 
Cubic Hermite, 144 
cubic spline, 145 
description, 105 
Hermite polynomial, 136 
inverse, 124 
iterated inverse, 124 
Lagrange polynomial, 110 
linear, 109 
Neville’s method, 120 
piecewise linear, 144 
polynomial, 108 
quadratic spline, 145 
Taylor polynomial, 106 
trigonometric, 171 


zeros of Chebyshev polynomials, 524 


Inverse interpolation, 124 
Inverse matrix, 386 
Inverse power method, 583 
Inverse Power Method algorithm, 584 
Invertible matrix, 386 
Isotropic, 713 
Iterated inverse interpolation, 124 
Iterative refinement, 469, 474 
Iterative Refinement algorithm, 474 
Iterative technique definition, 450 
Gauss-Seidel, 454 
Jacobi, 450 
ITPACK, 495 


Jacobi Iterative algorithm, 453 

Jacobi iterative method description, 
450 

Jacobi method for a symmetric matrix, 
612 

Jacobi, Carl Gustav Jacob, 451 

Jacobian matrix, 640 

JAVA, 40 

Jenkins-Traub method, 102 


kth divided difference, 125 

Kahan’s Theorem, 465 

Kentucky Derby problem, 163 
Kirchhoff’s Laws, 184, 275, 331, 357 
Kowa, Takakazu Seki, 87, 396 
Krylov, Aleksei Nikolaevich, 495 
Kutta, Martin Wilhelm, 283 


1, norm 

of a matrix, 442 

of a vector, 441 
1, norm 

of a matrix, 439, 446 

of a vector, 432 
lj. norm 

of a matrix, 439, 440 

of a vector, 433 
Ladder problem, 100 
Lagrange polynomial 

definition, 110 

error formula, 112 

recursively generating, 119 
Lagrange, Joseph Louis, 110, 361 
Laguerre polynomial, 258, 518 
Laguerre’s method, 102 
LAPACK, 44, 429, 495, 627 
Laplace equation, 678, 714 
Laplace, Pierre-Simon, 714 
LDL' factorization, 417 
LDL' Factorization algorithm, 417 
Leading principal submatrix, 416 
Least squares 

continuous, 510, 539 

discrete, 498, 541 

exponential, 504 

general, 499 

linear, 499 
Least-change secant update methods, 

648 

Legendre polynomial, 232, 516 
Legendre, Adrien-Marie, 233 
Leibniz, Gottfried, 396 
Levenberg-Marquardt method, 669 
Light diffraction problem, 230 
Limit of a function 

from R to R, 3 


Index 


from R” to R, 632 

from R” to R", 632 
Limit of a sequence, 3, 436 
Linear 

approximation, 499 

basis functions, 699, 748 

boundary value problem, 673 

convergence, 79 

error growth, 34 

interpolation, 109 

shooting method, 674 
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Linear Finite-Difference algorithm, 687 


Linear finite-difference method, 684 
Linear Shooting algorithm, 674 
Linear system 
backward substitution, 359, 361 
definition, 357 
reduced form, 359, 386, 400 
simplifying, 358 
triangular form, 359, 362, 386, 400 
Linearly dependent 
functions, 512 
vectors, 564 
Linearly independent 
eigenvectors, 567 
functions, 512 
vectors, 564 
LINPACK, 44, 495 
Lipschitz condition, 17, 261, 329 
Lipschitz constant, 17, 261 
Lipschitz, Rudolf, 261 
LL' factorization, 417 
Local definition, 339 
Local error, 277 
Local truncation error 
of multistep methods, 306, 342 
of one step method, 276 
of one-step method, 340 
of Runge-Kutta methods, 290 
related to global error, 340, 343 
Logistic population growth, 78, 328 
Lower triangular matrix, 386, 400 
LU factorization of matrices, 400 
operation counts, 411 
LU Factorization algorithm, 405 


m-step multistep method, 302 
Machine number, 18 
Maclaurin 
polynomial, 11 
series, 11 
Maclaurin, Colin, 11 
Mantissa, 18 
Maple, 40, 45 
adamsbashforth, 309 
adamsbashforthmoulton, 313 
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adamsmoulton, 309 
adaptive, 228 
AddPoint, 122 
BackSubstitution, 405 
BackwardSubstitute, 365 
chebyshev, 537 
chopping arithmetic, 31 
ConditionNumber, 471 
convert, 13, 530 
CurveFitting, 152 

deq, 264 

Determinant, 397 

diff, 6 

Digits, 13 

dsolve, 264, 333 
Eigenvalues, 445 
Eigenvectors, 445 

eqns, 636 

evalm, 378 
ExponentialFit, 506 
ForwardSubstitution, 405 
fsolve, 6, 77 


Runge-Kutta for higher order equations, 
336 

Runge-Kutta-Fehlberg for higher order 
equations, 336 

Runge-Kutta-Fehlberg for systems, 334 

series, 530 

simplify, 40 

simpson, 210 

solve, 77 

SOR, 467 

Statistics, 503 

taylor, 13 

TaylorApproximation, 283 

Transpose, 390 

trapezoid, 210 

trunc, 31 

vars, 636 

with, 6 

with(LinearAlgebra), 365 

with(Student), 210 

with(Student[NumericalAnalysis]), 
210 


Gaussian Elimination with Partial 
Pivoting algorithm, 374 
Gaussian Elimination with Scaled 

Partial Pivoting algorithm, 376 
Gaussian transformation, 402 
Hilbert, 478, 512 
identity, 386 
ill-conditioned, 471 
induced norm, 438 
inverse, 386 
invertible, 386 
Iterative Refinement algorithm, 474 
Jacobi Iterative algorithm, 453 
Jacobian, 640 
1, norm, 442 
1, norm, 439, 446 
14. norm, 439, 440 
LDL' factorization, 417 
LDL'Factorization algorithm, 417 
LL' factorization, 417 
lower triangular, 386, 400 
LU factorization, 400 


GaussianElimination, 365 Mathematica, 40 LU Factorization algorithm, 405 
Gauss-Siedel, 457 MATLAB, 40, 45, 103, 172, 430 minor, 396 
implicitplot, 643 Matrix multiplication, 384 


implicitplot3d, 644 

init, 264 

Initial ValueProblem, 269 
IsDefinite, 424 
IsMatrixShape, 424 
Jacobi, 454 
LinearAlgebra, 445 
LinearFit, 503 

LU Factorization, 405 
Matrix, 365 
MatrixDecomposition, 405 
MatrixInverse, 390 
Multint, 250 
MultivariateCalculus, 250 
MultivariateCalculus, 283 
NevilleTable, 121 
newtoncotes, 211 
NonlinearFit, 506 
numapprox, 536 

options, 636 

orthopoly, 536 

plot, 6 

PLU Decomposition, 409 
polynom, 13 

Quadrature, 210 

ratpoly, 530 

restart, 6 

rhs, 264 

romberg, 220 

rounding arithmetic, 22 
RowOperation, 365 


addition, 382 

augmented, 360 

band, 421 

characteristic polynomial, 443 

Cholesky’s algorithm, 418 

Cholesky’s method, 405 

cofactor of, 396 

complete(or maximal) pivoting, 379 

condition number, 470 

convergent, 448 

Cramer’s rule, 400 

Crout Factorization for Tridiagonal 
Linear Systems algorithm, 422 

Crout’s method, 405, 421 

definition, 359 

deteminant facts, 397 

determinant, 396 

diagonal, 386 

diagonalization, 571 

diagonally dominant, 412 

distance between, 438 

Doolittle’s method, 405, 421 

eigenvalue, 443 

eigenvector, 443 

equal, 381 

equivalent statements, 398 

factorization, 400 

Frobenius norm, 442 

Gauss-Jordan method, 370 

Gauss-Seidel Iterative algorithm, 
456 


natural norm, 438 

nilpotent, 449 

nonnegative definite, 573 

nonsingular, 386 

norm, 438 

nullity of, 614 

orthogonal, 570 

orthogonally diagonalizable, 572 

partial pivoting, 374 

permutation, 407 

persymmetric, 569 

pivot element, 363 

pivoting, 372 

positive definite, 414, 416, 461, 573, 
730, 734 

positive semidefinite, 573 

product, 384 

P'LU factorization, 407 

QR algorithm, 608 

rank of, 614 

reduced to diagonal, 572 

reduced to tridiagonal, 593 

rotation, 602 

scalar multiplication, 382 

Scaled Partial Pivoting, 375 

similar, 571 

similarity transformation, 571 

singular, 386 

singular values, 616 

SOR algorithm, 466 

sparse, 431 
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spectral radius, 446 
square, 385 
strictly diagonally dominant, 412, 730, 
734 
submatrix, 396 
sum, 382 
symmetry, 390 
transformation, 402 
transpose, 390 
tridiagonal, 421, 730, 734 
unitary, 572 
upper Hessenberg, 600, 610 
upper triangular, 386, 401 
well-conditioned, 471 
zero, 382 
Matrix-matrix product, 384 
Matrix-vector product, 383 
Maximal column pivoting (see partial 
pivoting), 374 
Maximal pivoting, 379 
Maximum temperature for hydra problem, 
646 
Mean Value Theorem, 4 
Mean Value Theorem for Integrals, 10 
Mesh points, 266, 716 
Method of collocation, 710 
Method of false position, 73 
Method of False Position algorithm, 73 
Method of steepest descent, 481, 654 
Midpoint method, 286, 321 
Midpoint rule, 201 
composite, 209 
error term, 201 
Milne’s method, 313 
stability of, 346 
Milne, Edward Arthur, 313 
Milne-Simpson method, 314 
stability of, 347 
Minimax, 499 
Minor, 396 
Modified Euler method, 286 
Monic polynomial, 521 
Moulton, Forest Ray, 303 
mth-order system, 328 
Miiller’s algorithm, 97 
Miiller’s method, 96 
Multiple integrals, 237 
Multiplicity of a root, 82 
Multistep method, 302 


n+ 1-point formula, 176 

NAG, 45, 102, 171, 259, 356, 430, 558, 
712, 760 

NASTRAN, 761 

Natural boundary, 146, 705 

Natural Cubic Spline algorithm, 149 


Natural matrix norm, 438 
Natural spline, 147 
Nested arithmetic, 27, 92 
Nested polynomial, 28 
Netlib, 103, 171, 356, 559 
Neville’s Iterated Interpolation algorithm, 
122 
Neville’s method, 120 
Neville, Eric Harold, 120 
Newton backward difference formula, 
130 
Newton backward divided-difference 
formula, 130 
Newton forward difference formula, 129 
Newton interpolatory divided-difference 
formula, 126 
Newton’s Divided-Difference algorithm, 
126 
Newton’s method 
convergence criteria, 70 
definition, 67 
description, 67 
for nonlinear systems, 640 
for stiff equations, 352 
modified for multiple roots, 84, 86 
quadratic convergence of, 82, 639 
Newton’s Method algorithm, 67 
Newton’s method for nonlinear 
boundary-value problems, 680 
Newton’s Method for Systems algorithm, 
641 
Newton, Isaac, 67 
Newton-Cotes closed formulas, 200 
Newton-Cotes open formulas, 201 
Newton-Raphson algorithm, 67 
Newton-Raphson method, 67 
Nicolson, Phyllis, 733 
Nilpotent matrix, 449 
Noble beast problem, 164 
Nodes, 110, 145, 748 
Nonlinear Finite-Difference algorithm, 
692 
Nonlinear finite-difference method, 
691 
Nonlinear Shooting algorithm, 681 
Nonlinear shooting method, 678 
Nonlinear systems, 630 
Nonnegative definite matrix, 573 
Nonsingular matrix, 386 
Norm equivalence of vectors, 438 
Norm of a matrix 
definition, 438 
Frobenius, 442 
induced, 438 
1, 442 
1h, 439, 446 
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1x, 439, 440 
natural, 438 
Norm of a vector 
algorithm, 41 
definition, 432 
1, 441 
1, 432 
Io, 432 
Normal density function, 213 
Normal equations, 500, 502, 511, 698 
Nullity of a matrix, 614 
Numerical differentiation 
backward difference formula, 174 
description, 174 
extrapolation applied to, 187 
five-point formula, 178 
forward difference formula, 174 
higher derivatives, 179 
instability, 182 
n+ 1-point formula, 176 
Richardson’s extrapolation, 185 
round-off error, 180, 184 
three-point formula, 178 
Numerical integration 
adaptive quadrature, 223 
Adaptive Quadrature algorithm, 
226 
closed formula, 200 
composite, 204 
composite midpoint rule, 209 
composite Simpson’s rule, 207 
composite trapezoidal rule, 208 
double integral, 237 
explicit formula, 200 
extrapolation, 215 
Gaussian quadrature, 230, 243, 248 
Gaussian-Kronrod, 259 
implicit formula, 201 
improper integral, 253 
midpoint rule, 201 
multiple integral, 237 
Romberg, 215 
Simpson’s rule, 196, 200 
Simpson’s three-eighths rule, 200 
stability, 211 
trapezoidal rule, 194, 200 
triple integral, 248 
Numerical quadrature (see numerical 
integration), 193 
Numerical software, 40 


O notation, 37 

Oak leaves problem, 116, 163 

One-step methods, 302 

Open formula, 201 

Open method (see explicit method), 302 
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Open Newton-Cotes formulas, 201 
Operation counts 

Cramer’s rule, 400 

factorization, 401, 411 

fast Fourier transform, 550 

Gauss-Jordan, 370 

Gaussian elimination, 366 

LU factorization, 411 

scaled partial pivoting, 378 
Order of convergence, 37 
Ordinary annuity equation, 77 
Organ problem, 745 
Orthogonal matrix, 570, 614 
Orthogonal polynomials, 510 
Orthogonal set 

of functions, 515 

of vectors, 566 
Orthogonally diagonalizable, 572 
Orthonormal set 

of functions, 515 

of vectors, 566 
Osculating polynomial, 136 
Ostrowski-Reich Theorem, 465 
Over relaxation method, 464 
Overflow, 19 


I, approximating, 192 
Pade approximation technique, 529 
Padé Rational Approximation algorithm, 
531 
Padé, Henri, 529 
Parabolic partial differential equation, 
714, 725 
Parametric curve, 164 
Partial differential equation 
Backward difference method, 729 
Centered-Difference formula, 732 
Crank-Nicolson algorithm, 734 
Crank-Nicolson method, 733 
elliptic, 713, 716 
finite element method, 746 
Finite-Difference method, 717 
Finite-Element algorithm, 753 
Forward difference method, 726 
Heat Equation Backward-Difference 
algorithm, 730 
hyperbolic, 715, 739 
parabolic, 714, 725 
Poisson Equation Finite-Difference 
algorithm, 720 
Richardson’s method, 732 
Wave Equation Finite-Difference 
algorithm, 742 
Partial pivoting, 374 
Particle problem, 55, 213 
Pascal, 40 


Peano, Guiseppe, 261 
Pendulum problem, 259, 338 
Permutation matrix, 407 
Persymmetric matrix, 569 
Perturbed problem, 263 
Picard method, 265 
Piecewise cubic Hermite polynomial, 144, 
166, 280 
Piecewise linear interpolation, 144 
Piecewise Linear Rayleigh-Ritz algorithm, 
702 
Piecewise-linear basis functions, 699 
Piecewise-polynomial approximation, 144 
Pipe organ problem, 745 
Pivot element, 363 
Pivoting 
complete, 379 
maximal, 379 
partial, 374 
scaled partial, 375 
strategies, 372 
total, 379 
Plate deflection problem, 690 
Plate sinkage problem, 629, 646 
Point of singularity, 253 
Poisson equation, 713, 716 
Poisson Equation Finite-Difference 
algorithm, 720 
Poisson, Siméon-Denis, 714 
Polynomial 
algebraic, 91, 106 
Bézier, 169 
Bernstein, 117, 170 
characteristic, 350, 443 
Chebyshev, 518 
definition, 91 
evaluation, 28, 92 
Hermite, 136 
interpolating, 110 
Lagrange, 110 
Laguerre, 258, 518 
Legendre, 232, 516 
Maclaurin, 11 
monic, 521 
nested, 28, 92 
Newton, 126 
orthogonal, 510 
osculating, 136 
roots of, 92 
Taylor, 11, 106, 283 
trigonometric, 539 
zeros of, 92 
Population growth, 47, 78, 105, 116, 135, 
163, 328, 338, 450, 638 
Gompertz, 78 
logistic, 78, 328 


Positive definite matrix, 414, 416, 461, 
573, 730, 734 
Positive semidefinite matrix, 573 
Power method, 576 
Power Method algorithm, 578 
Power method for symmetric matrices, 
581 
Power series economization of, 526 
Precision, degree of, 197 
Preconditioning, 486 
Predator-prey problem, 338 
Predictor-Corrector algorithm, 310 
Predictor-corrector method, 310 
Program 
general-purpose, 41 
special-purpose, 41 
Projectile problem, 282 
Pseudocode, 32 
P'LU factorization, 407 


QR algorithm, 608 
QR method, 601 
QUADPACK, 259 
Quadratic convergence 
definition, 79 
of Newton’s method, 82, 639 
Steffensen’s method, 88 
Quadratic formula, 25 
Quadratic spline, 163 
Quadratic spline interpolation, 145 
Quadrature 
Gaussian, 230, 243, 248 
Gaussian-Kronrod, 259 
Quadrature formula 
degree of accuracy, 197 
degree of precision, 197 
Quadrature (see also numerical 
integration), 193 
Quasi-Newton algorithms, 648 
Quasi-Newton methods, 647 


Racquetball problem, 78 

Random walk problem, 461 

Rank of a matrix, 614 

Raphson, Joseph, 67 

Rashevsky, 276 

Rate of convergence, 37 

Rational function, 528 

Rational function approximation, 528 
Rayleigh Ritz method, 696 

Reduced form system of equations, 359 
Region of absolute stability, 351 
Regula falsi method, 73 

Relative error, 20 

Relaxation method, 464 

Remainder term, 11 
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Remez, Evgeny, 537 
Residual vector, 462, 469 
Reverse shooting method, 677 
Richardson’s extrapolation, 185, 688, 694 
Richardson’s method, 732 
Richardson, Lewis Fry, 185 
Riemann integral, 9 
Riemann, George Fredrich Berhard, 9 
Ritz, Walter, 697 
Rolle’s Theorem, 4 
Rolle, Michel, 4 
Romberg algorithm, 219 
cautious, 220 
Romberg integration, 215 
Romberg, Werner, 215 
Root 
complex, 96 
definition, 48 
simple, 82 
Root-finding problem, 48 
Roots of equations 
bisection method, 48 
condition, 345 
cubic convergence, 86 
method of false position, 73 
Miiller’s algorithm, 97 
Miiller’s method, 96 
multiple, 82 
Newton’s method, 67 
Newton’s method for systems, 640 
Secant method, 71 
Rotation matrix, 602 
Round-off error, 18, 20, 180, 184 
Rounding arithmetic, 20 
in Maple, 22 
Row vector, 360 
Ruddy duck problem, 158 
Ruffini, Paolo, 93 
Runge, Carl, 283 
Runge-Kutta method, 283 
local truncation error, 290 
Runge-Kutta Method for Systems of 
Differential Equations algorithm, 
330 
Runge-Kutta Order Four algorithm, 288 
Runge-Kutta order four method, 288 
Runge-Kutta-Fehlberg algorithm, 297 
Runge-Kutta-Fehlberg method, 296, 356 
Runge-Kutta-Merson method, 356 
Runge-Kutta- Verner method, 301, 356 


Scalar product, 382 
Scaled partial pivoting, 375 
operation counts, 378 
Scaled-column pivoting (see Scaled 
partial pivoting), 375 


Scaling factor, 166 
Schmidt, Erhard, 515 
Schoenberg, Isaac Jacob, 145 
Schur’s Theorem, 572 
Schur, Issai, 572 
Schwarz, Hermann Amandus, 434 
Search direction, 480 
Secant algorithm, 71 
Secant method 

definition, 71 

for nonlinear boundary-value problem, 

679 

for stiff equations, 352 

order of convergence, 86 
Seidel, Phillip Ludwig, 454 
Sequence 

Fibonacci, 40 

limit of, 3, 436 
Series 

Fourier, 539 

harmonic, 40 

Maclaurin, 11 

Taylor, 11 
Set, convex, 261 
Sherman-Morrison Theorem, 649 
Shooting method 

linear equation, 674 

nonlinear equation, 678 
Significant digits, 21 
Significant figures, 21 
Signum function, 54 
Silver plate problem, 724, 759 
Similar matrices, 571 
Similarity transformation, 571 
Simple root, 82 
Simple zero, 82 
Simpson’s composite rule, 207 
Simpson’s Double Integral algorithm, 245 
Simpson’s method, 313 
Simpson’s rule, 196, 200 

adaptive, 223 

composite, 207 

error term, 200 
Simpson’s three-eighths rule, 200 
Simpson, Thomas, 196 
Singular matrix, 386 
Singular value decomposition, 614 
Singular values, 616 
Singularity, 253 
SLAP, 495 
SOR algorithm, 466 
SOR method 

definition, 464 

in heat equation, 730 

in Poisson equation, 722 
Sparse matrix, 431 
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Special-purpose software, 41 
Spectral radius 
definition, 446 
relation to convergence, 448, 449 
Speed and distance problem, 143, 163 
Sphinx moth problem, 654 
Spread of contagious disease, 301 
Spring-mass problem, 229, 230 
Square matrix, 385 
Stability of initial-value techniques, 339 
Stability, round-off error, 211 
Stable algorithm, 34 
Stable method, 211, 340 
Steady state heat distribution, 713 
Steepest Descent algorithm, 658 
Steepest descent method, 481, 654 
Steffensen’s algorithm, 88 
Steffensen’s method, quadratic 
convergence, 88 
Steffensen, Johan Frederik, 88 
Steifel, Eduard, 479 
Stein Rosenberg Theorem, 459 
Step size, 266 
Stiff differential equation, 348 
Stirling’s formula, 132 
Stirling, James, 132, 529 
Stoichiometric equation, 293 
Strictly diagonally dominant matrix, 412, 
730, 734 
Strongly stable method, 345 
Strutt (Lord Rayleigh), John William, 696 
Sturm-Liouville system, 561 
Submatrix 
definition, 396 
leading principal, 416 
Successive over relaxation (SOR) method, 
464 
Superlinear convergence, 91, 648 
Surface area problem, 252 
Symmetric matrix, 390 
Symmetric Power Method algorithm, 581 
Synthetic division, 93 
System of differential equations, 260, 328 
System of linear equations, 357 
System of nonlinear equations, 630 


Taconite problem, 509 
Taylor method for initial-value problem, 
276 
Taylor polynomial 
in one variable, 11, 106 
in two variables, 283 
Taylor series, 11 
Taylor’s Theorem 
multiple variable, 283 
single variable, 10 
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872 Index 


Taylor, Brook, 11 
Temperature in a cylinder problem, 
738 
Templates, 495 
Terrain vehicles problem, 78 
Test equation, 349 
Three-point formula, 178 
Total pivoting, 379 
Transformation matrix, Gaussian, 402 
Transformation similarity, 571 
Transmission line problem, 745 
Transpose facts, 390 
Transpose matrix, 390 
Trapezoidal method, 351 
Trapezoidal rule, 194, 200 
adaptive, 230 
composite, 208 
error term, 200 
Trapezoidal with Newton Iteration 
algorithm, 352 
Triangular system of equations, 359, 
362 
Tridiagonal matrix, 730, 734 
definition, 421 
reduction to, 593 
Trigonometric interpolation, 171 


Trigonometric polynomial approximation, 


538, 539 

Triple integral, 248 

Trough problem, 55 

Truncation error, 11 

Two-point boundary-value problem, 
672 
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Unconditionally stable, 729, 732 
Under relaxation method, 464 
Underflow, 19 
Unitary matrix, 572 
Unstable algorithm, 34 

Unstable method, 182, 345 

Upper Hessenberg matrix, 600, 610 
Upper triangular matrix, 386, 401 


Van der Pol equation, 684 
Variable step-size multistep method, 
315 
Variational property, 697 
Vector space, 382 
Vector(s) 
A-orthogonal set, 481 
column, 360 
convergent, 432 
covergence, 436 
definition, 360 
distance between, 435 
Euclidean norm of, 433 
1; norm of, 441 
1, norm of, 432 
1,, norm of, 432 
linearly dependent, 564 
linearly independent, 564 
norm equivalence of, 438 
norm of, 432 
orthogonal set, 566 
orthonormal set, 566 
residual, 462, 469 
row, 360 


Vibrating beam, 561 
Vibrating string, 715 


Viscous resistance problem, 213 


Waring, Edward, 110 
Water flow problem, 292 
Wave equation, 715 


Wave Equation Finite-Difference 


algorithm, 742 
Weak form method, 709 
Weakly stable method, 345 


Weierstrass Approximation Theorem, 106 


Weierstrass, Karl, 3, 106 
Weight function, 514 


Weighted Mean Value Theorem for 


Integrals, 10 


Well-conditioned matrix, 471 


Well-posed problem, 263 
Wielandt’s Deflation, 587 


Wielandt’s Deflation algorithm, 588 


Wielandt, Helmut, 587 


Wilkinson, James Hardy, 476, 611 
Winter moth problem, 116, 163 


Xnetlib, 44 


Zero 
complex, 96 
definition, 48 
multiplicity of, 82 
polynomial, 92 
simple, 82 


Zeroth divided difference, 125 
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Index of Algorithms 


Bisection 2.1 49 

Fixed-Point Iteration 2.2 60 

Newton’s 2.3 68 

Secant 2.4 72 

False Position 2.5 74 

Steffensen’s 2.6 89 

Horner’s 2.7. 95 

Miiller’s 2.8 97 

Neville’s Iterated Interpolation 3.1 123 

Newton’s Interpolatory Divided-Difference 
3.2 126 

Hermite Interpolation 3.3. /4/ 

Natural Cubic Spline 3.4 149 

Clamped Cubic Spline 3.5 /55 

Bézier Curve 3.6 169 

Composite Simpson’s Rule 4.1 

Romberg 4.2. 217 

Adaptive Quadrature 4.3 224 

Simpson’s Double Integral 4.4 242 

Gaussian Double Integral 4.5 243 

Gaussian Triple Integral 4.6 245 

Euler’s 5.1 267 

Runge-Kutta (Order Four) 5.2 288 

Runge-Kutta-Fehlberg 5.3. 297 

Adams Fourth-Order Predictor-Corrector 5.4 
O11 

Adams Variable Step-Size 
Predictor-Corrector 5.5 317 

Extrapolation 5.6 323 

Runge-Kutta for Systems of Differential 
Equations 5.7 33] 

Trapezoidal with Newton Iteration 5.8 

Gaussian Elimination with Backward 
Substitution 6.1 364 

Gaussian Elimination with Partial Pivoting 
6.2 374 

Gaussian Elimination with Scaled Partial 
Pivoting 6.3 376 
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LU Factorization 6.4 406 

LDL' Factorization 6.5 417 

Cholesky’s 6.6 4/8 

Crout Factorization for Tridiagonal Linear 
Systems 6.7 422 

Jacobi Iterative 7.1 453 

Gauss-Seidel Iterative 7.2 456 

SOR 7.3. 467 

Iterative Refinement 7.4 474 

Preconditioned Conjugate Gradient 7.5 
487 

Padé Rational Approximation 8.1 531 

Chebyshev Rational Approximation 8.2 
35 

Fast Fourier Transform 8.3 553 

Power 9.1 578 

Symmetric Power 9.2. 58] 

Inverse Power 9.3. 585 

Wielandt Deflation 9.4 589 

Householder’s 9.5. 598 

QR9I.6 608 

Newton’s for Systems 10.1 

Broyden’s 10.2 650 

Steepest Descent 10.3 658 

Continuation 10.4 666 

Linear Shooting 11.1 674 

Nonlinear Shooting with Newton’s Method 
11.2 681 

Linear Finite-Difference 11.3 687 

Nonlinear Finite-Difference 11.4 693 

Piecewise Linear Rayleigh-Ritz 11.5 702 

Cubic Spline Rayleigh-Ritz 11.6 707 

Poisson Equation Finite-Difference 12.1 
720 

Heat Equation Backward-Difference 12.2 
730 

Crank-Nicolson 12.3. 734 

Wave Equation Finite-Difference 12.4 742 

Finite-Element 12.5 753 
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Glossary of Notation 
C(X) Set of all functions continuous on X 3 
C"(X) — Set of all functions having n continuous derivatives on X 4 
C™(X) Set of all functions having derivatives of all ordersonX 4 
R Set of realnumbers J// 
0.3 A decimal in which the numeral 3 repeats indefinitely 12 
fl) — Floating-point form of the real number y 20 
O(-) Order of convergence 37 
| Floor function, |x], the greatest integer less than or equaltox 44 
| Ceiling function, [x], the smallest integer greater than or equaltox 44 
sgn(x) Signofthe numberx: lifx>0,—-lifx<0O 54 


A Forward difference 88 

Z Complex conjugate of the complex number z 96 
(1) The kth binomial coefficient of ordern 117 
ttl Divided difference of the function f 125 

V Backward difference 130 

R" Set of ordered n-tuples of realnumbers 26] 
Tj Local truncation error at the ith step 276 
> Equation replacement 358 

o Equation interchange 358 

(aij) Matrix with aj; as the entry in the ith row and jthcolumn 359 
x Column vector or element of R’ 360 

[A,b] Augmented matrix 360 

O A matrix with all zero entries 382 

bij Kronecker delta: lifi=j,O0ifi Aj 386 
I, n X nidentity matrix 386 

Av! Inverse matrix of the matrix A 386 

A‘ Transpose matrix of the matrix A 390 

Mij Minor of a matrix 396 

det A Determinant of the matrix A 396 

0 Vector with all zero entries 398 

||x|| Arbitrary norm of the vectorx 432 


\|x||2 The Jd) norm of the vectorx 432 

I|xllo.  Thel,. norm of the vectorx 432 

||A]| Arbitrary norm of the matrix A 438 

\|All2 The 22 norm of the matrix A 439 

[Allo Thel. norm of the matrixA 439 

p(A) The spectral radius of the matrix A 446 

K(A) The condition number of the matrix A 470 

(x, y) Inner product of the n-dimensional vectors x andy 479 


Il, Set of all polynomials of degree n or less 513 

ine Set of all monic polynomials of degreen 522 

Th Set of all trigonometric polynomials of degree n or less 539 
C Set of complex numbers 562 

F Function mapping R” into R" 630 

A(x) Matrix whose entries are functions form R" into RR 639 

J (x) Jacobian matrix 640 

Vg Gradient of the function g 655 
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Trigonometry 
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sint=y cost =x 
sin t cos t 
tan t = ——_ cott = —— 
cost sin t 
1 1 
sec t = —— csc t = —— 
cost sint 
eed 2 ; . 1 
(sin t)- + (cost)* = 1 sin fy sin fy = z loos — to) — cos(t) + f)] 
; . : 1 
sin(t, fo) = sint; Cos f + cos f Sin fp COS f} COS fp = 7 loos — ty) + cos(t) + h)] 
: : ; 1. : 
cos(t; + t2) = cost) COS fg F sin fr, sin fo sin tj COS ft, = 5 isin(t — to) + sin(t; + h)] 


sina  sinB _ siny 
a B y 
Law of Cosines: =a +b? —2abcosy 


Law of Sines: 


Common Series 
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The Greek Alphabet 
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Beta B 8B Theta © @ Xi em €& Upsilon Y vu 
Gamma IT y Tota I Omicron O o Phi db 
Delta A 6 Kappa K k Pi Il az Chi x x 
Epsilon E ¢ Lambda A id Rho P p Psi UV 
Zeta Z ¢ Mu M p Sigma xu o0 Omega Q @ 
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